Part-of-speech tagging using ILP (KUL, YORK, UIA/KUB)

Application domain: Natural Language Processing
Source: UIA
Dataset size: 6.8 Mbytes
Data format: Text file
Systems Used: Progol, MACCENT, WARMR
Pointers: Luc Dehaspe (

The data (KUL, YORK, UIA/KUB)

Raw data from the Wall Street Journal corpus (Penn Treebank Project, Release 2) was pre-processed by UIA/KUB to add blank lines at the end of each sentence. UIA/KUB also induced a context-free grammar from the pre-parsed parts of the Wall Street Journal corpus. This grammar had 17,302 production rules.

PoS tagging experiments with WARMR and MACCENT (KUL)

WARMR (Dehaspe and De Raedt 1997) was used to exhaustively search the WSJ corpus for first order association rules that describe frequent properties of sentences such as:

in 81% of the sentences in which a determiner occurs followed by an adjective, there is also a sequence determiner, adjective, noun
and frequent properties of words, such as:
74% of the words that have a verb somewhere to the left within distance 4 and a preposition or subordinating conjunction to the right within distance 4 have a noun somewhere to the left within distance 4

In an initial small experiment with MACCENT (Dehaspe 1997) a probability distribution $p_{context}(POS\vert w)$ based on the context of the word was induced for words that are ambiguous between exactly the three classes noun, adjective, and verb. This initial experiment demonstrates the (poor) output of MACCENT can successfully be combined with other stochastic information sources such as the lexicon.

In an ongoing more systematic experiment WARMR is used as a feature generator for MACCENT. In a first stage WARMR generates frequent features, which are then added collectively to MACCENT. If restricted to a baseline set of non-relational features, MACCENT in this mode emulates the state-of-the-art system of (Ratnaparkhi 1996), with an excellent performance around 96.8% accuracy on testdata with unkown words. The aim of the experiment is to study the effects of additional relational features.

Experiments with Progol (YORK)

YORK used a small subset of the original grammar comprising only 40 production rules. Using this small grammar and a chart parser provided by KUL, YORK produced background knowledge in the form of charts for each sentence in the training data. These charts were represented as ground atoms: 'VP'(1,7,11). states that there is a VP between edges 7 and 11 in sentence 1. Non-ground clauses were also added to the background knowledge to allow simple morphological analysis of words. Examples were represented as - 'CC'(8,2,3,nor). - the words between edges 2 and 3 in sentence 8 is ``nor'' and it is tagged as a ``CC''. Using such knowledge representation it is possible to combine lexical, morphological and syntactic information to produce tagging rules. However, making use of the large amount of data requires further work on (i) sampling and (ii) learning from datasets (and possibly background knowledge) residing on disk. Work on the latter is ongoing at YORK. YORK intends to construct features from tagging clauses induced by Progol and have these as added features in KUL's MACCENT system.


  1. L. Dehaspe and L. De Raedt. Mining association rules in multiple relations. In Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297 of Lecture Notes in Artificial Intelligence, pages 125-132. Springer-Verlag, 1997.
  2. L. Dehaspe. Maximum entropy modeling with clausal constraints. In Proceedings of the 7th International Workshop on Inductive Logic Programming, volume 1297 of Lecture Notes in Artificial Intelligence, pages 109-124. Springer-Verlag, 1997.
  3. A. Ratnaparkhi. A maximum entropy part-of-speech tagger. In Proceedings of the Empirical Methods in Natural Language Processing Conference. University of Pennsylvania, 1996.

back to index