| Application domain: | Natural Language Processing |
| Source: | Standard PP-attachment dataset + original annotation extracted from WordNet |
| Dataset size: | |
| Data format: | Prolog |
| Systems used: | P-Progol |
| Pointers: | kazakov,jc,suresh@cs.york.ac.uk |
P-Progol is applied to a natural language processing task of learning
rules for PP-attachment disambiguation (Kazakov et al.). The
dataset consists of
20,000 examples of 2 ``almost''
disjunctive predicates, 4 intensionally defined background
predicates and
23,000 clauses of 6 other background
predicates.
The target predicates have the format
n(Verb,Noun,Preposition,Noun), v(Noun,Verb,Preposition,Noun)
and describe one of the following two syntactic structures:
(VP
(Verb NP(Noun PP(Prep Noun)))) or (VP (Verb NP(Noun) PP(Prep Noun))).
The background predicates map word-forms into lexical entries, and
semantic classes, e.g. begins (Verb)
(to) begin (Verb)
{begin, get, start out, commence}.
Progol rules covering each of the classes are learned and then applied to associate semantic classes with a test example of a given class, therefore reducing semantic ambiguity in the phrase.