|Application domain:||Adaptive system and network management|
|Source:||Syllogic for system managment, Cabletron for Network Management, Torino for additional system management and access control data|
|Dataset size:||network management (Cabletron): 5269209 bytes, system management (Leuven): 71844 bytes, system management (Syllogic+Torino): 1102111 bytes, intrusion detection (Torino): about 2 Mbytes|
|Systems Used:||C4.5, Regal, ReliC|
Last year's experiments started by using a set of 1000 examples provided to us by Luc Dehaspe in his experiments with Claudien, and gathered by using the scripts provided by Syllogic. This year, new data acquired at Syllogic were available.
Each fact is made of 14 parameters:
number of users logged to the system,
number of non root users logged on,
number of processes on the system,
number of non root processes,
number of defunct processes,
number of full file systems,
number of full i-nodes tables,
quantity of free space in the tmp file system (Kilobytes),
percentage of paging space used,
percentage of cpu time used by the users,
percentage of cpu time used by the system,
percentage of cpu time the system
was idle, percentage of cpu time used in I/O operations.
REGAL was requested to find a set of disjuncts covering the 100 facts stating that the sysload is high (sysload 3, see below). To find a disjunct covering a fact at time X, REGAL was allowed to use facts at time X-20, X-40, X-60 and X-80 (numbers are seconds). Thus, each disjunct involves a prediction gap of 20 seconds from NOW. We used a set of binary predicates, for each parameter in each fact. Each parameter was discretized defining an allowed interval and step.
Moreover, this year we added to the above
environment a binary predicate, sysgrow(Y,Z), which is true iff:
(Z - W)/3 + (Y - Z)2/3 + (K-Y) 0
and false otherwise. Where W is the SYSTEM_LOAD value at time
X - 80 seconds,
Z is SYSTEM_LOAD at time X - 60 seconds,
Y is SYSTEM_LOAD at time X - 40 seconds and
K is SYSTEM_LOAD at time X - 20 seconds1.
The intuitive meaning of this predicate is to
take care of the possible fluctuations of the system load
within the four tuples of each example, giving more
importance to the fluctuations closer to NOW. With this
new predicate we got the following
average outcomes over the three runs with last year's data:
percentage of covered examples of class ``low'' in the test set: 1.5%
percentage of covered examples of class ``average'' in the test set: 15.93%
percentage of covered examples of class ``high'' in the test set: 79.56%
accuracy = 89.66%
That is, the new predicate improves the predictive power of the learned rules of more than 10%. Since the obtained results were promising, we went on with the experiments using examples gathered during two days on three different computers at Syllogic. We report here about only one of them, an IBM RS/6000 running AIX 3.2.5.
Snapshots were taken each 60 seconds. The system load was classified as ``low'' for SYSTEM_LOAD 0.86, ``average'' for 0.86 SYSTEM_LOAD 1.33 and ``high'' for SYSTEM_LOAD 1.33. there are about 25% of examples of class ``high'' out of about 750 examples. Since the parameter values recorded at each snapshot were quite different from those of the previous experiments. discretization values and intervals had to be redefined accordingly. In particular, we see that the gap between a ``low'' and a ``high'' SYSTEM_LOAD is less than 0.5. As a consequence, the discretization of SYSTEM_LOAD used above (ten intervals of 0.5 each one) would be useless. Hence, we used a discretization in 50 intervals with a step of 0.2.
The average outcomes over the three runs as in the previous experiments
is as follows:
percentage of covered examples of class ``low'' in the test set: 4.4%
percentage of covered examples of class ``average'' in the test set: 13.19%
percentage of covered examples of class ``high'' in the test set: 78.26%
accuracy = 86.8%
We see that prediction power and accuracy still remain quite good even for rules that must foresee the system load evolution after 60 seconds (that is, three times later than in the previous experiments) and the gap among the three classes (low, average and high) are smaller.
The experiments were conducted using a very large dataset given to us by L. Lewis, a researcher of Cabletron Corporation. Data were collected as a result of collaboration with Syllogic. These data were collected monitoring the Cabletron network over a period of 18 weeks in 10 minute increments. During the 18 weeks period, the administrators noted that when the load on subnet 5 was greater than 35%, the user working on that network began to complain because it was too slow. The administrator made a process that warned him whenever the load on subnet 5 was closer to 35%, in order to prevent the reaching of that threshold. The datamining task is to discover what is the causing factor of the overloading on subnet 5 for improving the entire network performances; another goal is to find more complex relationships between subnetworks, for preventing overloading whithout simply looking at the value of load.
The data were formatted into the 57 comlumn table with 16849 rows. Each row represents a snapshot of the state of the network system. The provided information about this state consists of the following fields:
The dataset has been distributed with this problem to solve: try to find a predictor for the value of load on subnet 5 exceeding 35%. We used three different machine learning systems: C4.5, Regal and ReliC. In the experiments, Max_Load, Max_Load_pt, Max_Pkts, Max_Pkts_ps, Max_Coll and Max_Coll_pt were not used; in fact, it is very easy to learn a rule as `` if Max_Load_pt is and Max_Load then Load() ''. Rules like this are obvious, useless and, sometimes, can falsify the searching in the hypothesis space. These are the description of the experiments done:
The results, for each experiment, can be seen at table below2. In the table, the averages on the three runs for accuracy, errors over positive examples and errors over negative examples, are reported for each method. Reported data are about classification of testing cases. In the case of ReliC and C4.5, only pruned tree results are given. Moreover, observe that the number of positive examples is very small (200) w.r.t. the number of the negative. This doesn't affect learning for Regal, but could change things for algortihms based on decision trees; we solved this problem duplicating the number of positive examples for the ratio number_of_negatives/number_of_positives for C4.5, and adjusting the weights for ReliC, giving the ratio above as value for class `+' and value 1 for class `-'.