Discovery of neuropeptides (York)

Application domain: Discovery of neuropeptides
Dataset size: 44 positives and 3910 randoms
Data format: Progol
Systems used: Progol
Pointers: (data restricted)

The first real-world application of the positive-only learning framework of the ILP system Progol (Muggleton et al.) aims at the identification of a particular group of proteins in the absence of sequence homology. The approach taken is to generate a grammar from sequences of amino acids using Inductive Logic Programming. This is also the first attempt to acquire a grammar for a biological domain using ILP. The discovery of potentially pertinent non-terminals prior to induction as a possible way to improve performance has also been studied (Bryant et al.).


  1. C.H. Bryant, S.H. Muggleton, and C.J. Rawlings. Learning biological grammars: Does the discovery of potentially pertinent non-terminals prior to induction improve performance? Unpublished.
  2. S.H. Muggleton, C.H. Bryant, A. Srinivasan, I.S. Gloger, M. Lawrence, A. Whittaker, S. Topp, and C.J. Rawlings. Are grammatical representations useful for learning from biological sequence data? - a case study. In manuscript.

back to index