Learning MULTEXT languages nominal paradigms (LAI+York)

Application domain Morphology and tagging

Data source JSI, Tomaz Erjavec

Dataset size

Data format Prolog

Systems used FFOIL, CLOG, PROGOL

References (See list below)

Pointers http//nl.ijs.si/lll/

The paper (Manandhar et al. 1998) presents the decision list learning system CLOG and the results of using it to learn nominal inflections of English, Romanian, Czech, Slovene, and Estonian. The dataset used to induce rules for the synthesis and analysis of the inflectional paradigms of nouns and adjectives of these languages is the MULTEXTEAST multilingual tagged corpus. The ILP system FOIDL is also applied to the same dataset, and this paper compares the induction methodology and results of the two systems. The experiment shows that the accuracy of the two systems is comparable when using the same training set. However, while FOIDL is, due to efficiency reasons, severely limited in the size of the training set, CLOG does not suffer from such limitations. With the increase of the training set size possible with CLOG, it significantly outperforms FOIDL and learns highly accurate morphological rules.

References

S. Dzeroski and T. Erjavec. Inductive learning of multilingual morphology. Electrotechnical Review 65 (5): 296-302, 1998.
S. Manandhar, S. Dzeroski, and T. Erjavec. Learning multilingual morphology with CLOG. In Proc. Eighth International Conference on Inductive Logic Programming, pages 135-144. Springer, Berlin, 1998.

back to index

Application domain	Morphology and tagging
Data source	JSI, Tomaz Erjavec
Dataset size
Data format	Prolog
Systems used	FFOIL, CLOG, PROGOL
References	(See list below)
Pointers	http//nl.ijs.si/lll/