Learning Rules for PP-Attachment Disambiguation (York)

Application domain: Natural Language Processing

Source: Standard PP-attachment dataset + original annotation extracted from WordNet

Dataset size: $\sim$ 20,000 positive examples of 2 predicates, 4 intensionally defined background predicates and $\sim$ 23,000 clauses of 6 other background predicates

Data format: Prolog

Systems used: P-Progol

Pointers: kazakov,jc,suresh@cs.york.ac.uk

P-Progol is applied to a natural language processing task of learning rules for PP-attachment disambiguation (Kazakov et al.). The dataset consists of $\sim$ 20,000 examples of 2 ``almost'' disjunctive predicates, 4 intensionally defined background predicates and $\sim$ 23,000 clauses of 6 other background predicates.

The target predicates have the format n(Verb,Noun,Preposition,Noun), v(Noun,Verb,Preposition,Noun) and describe one of the following two syntactic structures:
(VP (Verb NP(Noun PP(Prep Noun)))) or (VP (Verb NP(Noun) PP(Prep Noun))). The background predicates map word-forms into lexical entries, and semantic classes, e.g. begins (Verb) $\rightarrow$ (to) begin (Verb) $\rightarrow$ {begin, get, start out, commence}.

Progol rules covering each of the classes are learned and then applied to associate semantic classes with a test example of a given class, therefore reducing semantic ambiguity in the phrase.

Bibliography

Dimitar Kazakov, James Cussens, and Suresh Manandhar. On the duality of semantics and syntax: The PP attachment case. Unpublished.

back to index

Application domain:	Natural Language Processing
Source:	Standard PP-attachment dataset + original annotation extracted from WordNet
Dataset size:	$\sim$ 20,000 positive examples of 2 predicates, 4 intensionally defined background predicates and $\sim$ 23,000 clauses of 6 other background predicates
Data format:	Prolog
Systems used:	P-Progol
Pointers:	`kazakov,jc,suresh@cs.york.ac.uk`