Each fact is made of 14 parameters: number of users logged to the system, number of non root users logged on, number of processes on the system, number of non root processes, number of defunct processes, number of full file systems, number of full i-nodes tables, quantity of free space in the tmp file system, quantity of paging space used, system load, percentage of cpu time used by the users, percentage of cpu time used by the system, percentage of cpu time the system was idle, percentage of cpu time used in I/O operations.
We used a set of binary predicates, one for each parameter
in each fact. The predicates are the same used in the experiments described
by Luc Dehaspe, apart from the different discretization given below. Each
parameter was discretized defining an allowed interval and step. Used discretizations
are finer than those in Claudien. All parameters are discretized in four,
five or more intervals. For example, the four parameters describing the
percentage of cpu time spent in user, system, idle, i/o are classified
in four interval steps: 0-25%, 25-50%, 50-75%, 75-100%. The sysload value
is discretized in 10 intervals (step = 0.5) from 0.0 to 5.0. As in the
Claudien experiments, here also sysload is high if greater then
3.0. Hence, the sysload prediction of each clause still is kept into the
three symbolic values low (0
sysload
1.5), average (1.5 < sysload
3) and high (3 < sysload).
The experiment was done on a SPARC Station 5 (actually, REGAL can also be run on a network of hosts), and required about 24 hours of cpu time. REGAL discovered 21 disjuncts covering all of the 100 facts stating that the sysload is high. As an example, consider the disjunct covering the largest number of facts (it covers 32 facts stating that sysload is high).
It says that:
sysload will be high in 20 seconds from NOW if:
in the last 40 seconds cpu_id has been in the interval
0-75% and
in the last 20 seconds cpu_us has been in the
interval 25-50% and
the number of non_root_processes has been in the
interval 50-130 and
sysload has not been in the interval between 2.0-2.5
and
cpu_us is NOW in the interval 25-100% and
sysload is not in the intervals 1.5-2.0 or 3.0-3.5
As a second experiment, the input set was split into a training and testing set. The training set contains about 2/3 of the examples, and the testing set the remaining 1/3. Examples of the three classes are distributed following the same proportions in the two sets (i.e., about 2/3 of the examples of class ``low'' are in the training set, and 1/3 in the test set. The same is for the examples of the other two classes). REGAL was run as in the first experiment but only on the training set, and the learned rules were tested on the test set. This was repeated three times. The examples in the test set were different in the three runs (i.e., the three test sets do not overlap). For the three runs, these are the obtained results:
1st run:
percentage of covered examples of class ``low'' in the
test set: 0.6%
percentage of covered examples of class ``average'' in
the test set: 17%
percentage of covered examples of class ``high'' in the
test set: 81%
accuracy = 86.8%
2nd run:
percentage of covered examples of class ``low'' in the
test set: 1.6%
percentage of covered examples of class ``average'' in
the test set: 14%
percentage of covered examples of class ``high'' in the
test set: 58%
accuracy = 80.1%
3rd run:
percentage of covered examples of class ``low'' in the
test set: 4.4%
percentage of covered examples of class ``average'' in
the test set: 11%
percentage of covered examples of class ``high'' in the
test set: 59%
accuracy = 83.2%
The average outcomes over the three runs are the following:
percentage of covered examples of class ``low'' in the
test set: 2.2%
percentage of covered examples of class ``average'' in
the test set: 14%
percentage of covered examples of class ``high'' in the
test set: 66%
accuracy = 83.2%
The highest error is of course found in covering the ``average'' facts, which are closer to the ``high'' facts. The learned rules are correctly able to predict that the system load is going to be high in the next 20 seconds in 2/3 of the test cases. The resulting accuracy means that in 81.2% of the considered facts a situation of high load is correctly classified as ``high'' and a situation of low or average load is not classified as ``high''.
We believe that better results could be achieved by 1) increasing the computational power of the system on which REGAL is run. This will allow to discover a more general set of disjuncts covering the examples, and hence with a higher predictive power; 2) further increasing the chosen discretization, in order to have a more precise characterization of a given fact and 3) introducing relational predicates used to compare the relative values of some parameters. For example, if we know that, let's say, 20 seconds ago the sysload was 5.3 and NOW it is 3.1, by comparing the two values it is possible to guess that the sysload will not be ``high'' after 20 seconds, because it is decreasing.
Hence, as a second set of experiments we added to the above environment a binary predicate, sysgrow(Y,Z), which is true if sysload at time X-20 (Z variable) is greater than sysload at time X-80 or at time X-60 or at time X-20 (Y variable). The obvious meaning of this predicate is to catch situations where the sysload is growing.
Below are the results on the three runs and on the average:
1st run:
percentage of covered examples of class ``low'' in the
test set: 1.96%
percentage of covered examples of class ``average'' in
the test set: 12.5%
percentage of covered examples of class ``high'' in the
test set: 75%
accuracy = 90.93%
2nd run:
percentage of covered examples of class ``low'' in the
test set: 2.5%
percentage of covered examples of class ``average'' in
the test set: 17.96%
percentage of covered examples of class ``high'' in the
test set: 62.06%
accuracy = 86.64%
3rd run:
percentage of covered examples of class ``low'' in the
test set: 3.73%
percentage of covered examples of class ``average'' in
the test set: 17.5%
percentage of covered examples of class ``high'' in the
test set: 70.2%
accuracy = 86.7%
The average outcomes over the three runs are the following:
percentage of covered examples of class ``low'' in the
test set: 2.73%
percentage of covered examples of class ``average'' in
the test set: 15.98%
percentage of covered examples of class ``high'' in the
test set: 69.08%
accuracy = 88.09%
Learning times are roughly the same. We may see that the predictive power improves, because of the presence of a relational predicate.
However, we also observed that the set of learned rules was smaller, and rules were simpler (again, roughly this is due to the presence of truly first order predicate). REGAL showed the ability to face some of the problems related to complex hypothesis: deep and multi-clauses.