Soft Drinks Market Research (UCY)

Application domain: market research
Source: AMER inc.
Dataset size: 31Kb Prolog file, 100 examples
Data format: Prolog facts, ACL input format
Systems Used: ACL, C4.5, FOIL

The data (UCY)

The data is the result of a market research on a new soft drink that has to be sold in the Middle East. The market research was conducted by making interviews to potential customers: they were asked to taste the drink and to fill a questionnaire. The key question was ``would you buy the product ?'' and this is the attribute to predict. The other questions can be divided into two groups. One group contains question regarding attributes of the drink (``how good is the flavour ?'', ``refreshing ?'', level of aroma ?'', ``level of sweetness ?'', ``level of flavour strength'', ``colour ?'', ``mouthfeel ?''). The other group regards personal tastes (``do you like delicious drinks?'', ``like natural ?'', ``like apparent fruit ?'', ``do you dislike strong sour ?'', ``dislike nothing ?''). The question about buy and questions from the first group required as answer a grade from 1 to 5.

The data set has been represented in Prolog: each interviewed person is assigned a different numeric constant. For each attribute of the drink, the range from 1 to 5 was represented using three predicates: one for answers 1 and 2, one for answer 3 and one for answer 4 and 5. Questions about customer's personal tastes were represented using one predicate.

Overall, 100 persons were interviewed, 52 answered with 4 or 5 to the question ``would you buy the prodcut ?'' (e.g. postby(16)), 32 answered with 1 or 2 (e.g. postnotbuy(24)) and 16 do not know (e.g. postdkbuy(31)). For some attributes the information is incomplete: flavour (37 incomplete cases), refreshing (40), like delicious (75), like natural (68), like apparent fruit (89), dislike strong sour (88) and dislike nothing (68).

Some questions are unanswered or have don't care answers and these have been treated as incomplete information in the background. Out of 24 background predicates, 8 are incomplete with degree of incompleteness from 37% (i.e. 37 people out of loo have not answered or have answered don't care) up to 89%. The incomplete background predicates have been considered as abducibles and integrity constraints have been introduced in order to avoid the abduction of two different answers for the same question. For example, for the question of "overall flavour" we have the following constraint on the abducible predicates that record answers to this question:

   <-- goodflavouroverall(X), poor flavouroverall(X).

Experiments with ACL (UCY)

The experiments were performed using only the first phase of ACL, called Intermediate-ACL. In this phase, it is learned an abductive theory containing only new rules, not new integrity constraints. The condition that the learned theory $T'=\langle P', A, I\rangle$ must satisfy can be rewritten as $T' \models_{A} E^{+}, not\_E^{-}$, where $E^+, not\_E^{-}$ stands for the conjunction of all positive examples and of the negation of all negative examples1.

First we tried to learn the concept postbuy We used as negative examples postnotbuy. The first experiment was conducted using the information on all available attributes and gave the results:

$postbuy(X) \mbox{$\:\leftarrow\:$}goodflavouroverall(X),rightsweetness(X)$ 47 (10) 0 (14)
$postbuy(X) \mbox{$\:\leftarrow\:$}goodflavouroverall(X),rightmouthfeel(X)$ 2      
$postbuy(X) \mbox{$\:\leftarrow\:$}goodflavouroverall(X),palecolour(X)$ 1      
$postbuy(X)\mbox{$\:\leftarrow\:$}darkcolour(X),rightflavourstrength(X)$ 1      
$postbuy(X) \mbox{$\:\leftarrow\:$}darkcolour(X),rightaroma(X),highsweetness(X)$ 1      

Rules are followed by a maximum of 4 numbers in this form $Npt(Npa)$ $Nm(Nma)$. $Npt$ is the number of positive examples covered by the rule with or without abduction. $Npa$ is the number of positive examples covered by the rule by using abduction (if absent is 0). $Nm$ is the number of negative examples covered by the rule, i.e. $e^-$ for which $\mbox{$\:\leftarrow\:$}not e^-$ failed (if absent is 0, i.e. the rule is consistent). $Nma$ is the number of negative examples not covered by using abduction, i.e. $\mbox{$\:\leftarrow\:$}not e^-$ succeeded with a non-empty explanation (if absent is 0).

It is interesting to investigate how the definitions learned for postbuy behave with respect to the 16 don't know examples, i.e. the examples for postdkbuy. The definition for postbuy covers 10 of the don't know examples, out of which 9 are covered with abduction. This means that, according to the definition learned for postbuy, in 10 cases out of 16 (around 60%) the indecisive customer will buy the product.

We learned as well the concepts postnotbuy and postdkbuy. Moreover, we performed a number of experiments on learning postbuy: without using abduction, using also negated literals in the body of rules, without using the flavour attribute, without using the sweetness attribute, without using both flavour and sweetness.

Experiments with FOIL (UCY)

We report the results obtained with the Inductive Logic Programming system FOIL version 6:

$postbuy(A) \mbox{$\:\leftarrow\:$}goodflavouroverall(A), rightmouthfeel(A)$ 37      
$postbuy(A) \mbox{$\:\leftarrow\:$}likedelicious(A), veryrefreshing(A)$ 5      
$postbuy(A)\mbox{$\:\leftarrow\:$}dislikenothing(A)$ 3      
$postbuy(A) \mbox{$\:\leftarrow\:$}palecolour(A),goodflavouroverall(A)$ 2      
$postbuy(A) \mbox{$\:\leftarrow\:$}darkcolour(A), rightaroma(A),syrupymouthfeel(A)$ 2      
$postbuy(A) \mbox{$\:\leftarrow\:$}rightsweetness(A),waterymouthfeel(A), likenatural(A)$ 1      

The figure following the rule represents the number of covered positive examples. Note that the previous definition does not cover 2 positive examples for postbuy.

Summary of some recent experiments (UCY)

ACL1, mFOIL, c4.5 (and FOIL) were run on this data. The performance of ACL1, mFOIL and c4.5 were compared by means of a 5-fold cross validation. The average results for accuracy and runtimes are shown in table 1. ACL1 has found theories that are, on average, more accurate than C4.5 and mFOIL with run times higher than C4.5 but lower than ~ mFOIL. In general, the dominant rules found by ACL1 (and the other systems) were judged to be meaningful by the experts.

The second phase of ACL was also run on this data to find constraints which support the abductive rules and assumptions of ACLl. For example, one of the constraints found was

   <-- goodflavouroverall(X), higharoma(X)

which (partially) complements the available knowledge on goodflavouroverall(X). On average, the constraints found where again judged to be significant by experts.

Table 1: Performance on the drinks questionnaire data
AccuracyRun Times ( seconds )
ACL1C4.5mFOILACL1C4.5mFOIL
0.85250.8120.786.771.9211.13


... examples1
It can be shown that this condition is equivalent to that given in the decription of ACL


back to index