Application domain: | Discovery of neuropeptides |
Source: | SWISS-PROT http://www.expasy.ch/sprot/sprot-top.html |
Dataset size: | 44 positives and 3910 randoms |
Data format: | Progol |
Systems used: | Progol |
Pointers: | bryant@cs.york.ac.uk (data restricted) |
The first real-world application of the positive-only learning framework of the ILP system Progol (Muggleton et al.) aims at the identification of a particular group of proteins in the absence of sequence homology. The approach taken is to generate a grammar from sequences of amino acids using Inductive Logic Programming. This is also the first attempt to acquire a grammar for a biological domain using ILP. The discovery of potentially pertinent non-terminals prior to induction as a possible way to improve performance has also been studied (Bryant et al.).