Tagging words with their correct part-of-speech (singular
proper noun, predeterminer, etc) is an important precursor to further automatic
natural language processing. The task is non-trivial since many (English)
words have a number of possible part-of-speech (POS) tags. For example,
"Best" can be a superlative adjective, a superlative adverb or a proper
noun as in "Mr. Best". The (training) data used here comprised 2/3 of the
Wall Street Journal corpus (cf. previous section) which contains sentences
where each word is tagged with its correct part-of-speech for that sentence.
The corpus contains an unknown amount of noise. From the corpus, each word
was mapped to its set of possible POS tags, together with the relative
frequency of each POS tag. Positive and negative examples of when to eliminate
a POS tag, including the entire unambiguous left and right context (in
terms of tags) for the focus word were used by Progol. The left context
was reversed for efficiency. Background knowledge comprised predicates
for each of the 45 tags, as well as grammatical constructs such as verb
phrase and noun phrase.
The experiments
Work is still in progress: however, using the statistical
information for each word and a preliminary set of 798 tag elimination
clauses induced using Progol, a per-word test set accuracy of 95.7% was
achieved.
Reference
Cussens, J., Part-of-speech disambiguation using ILP. Technical
report, Oxford University Computing Laboratory, 1996.