Text categorisation by content and genre (GMD)

Application domain Text categorisation by content and genre

Source LIMAS corpus of contemporary German

Dataset size 500 documents with 2000 words each, up to 75,000 facts per run

-Data format RIBL (Sets of ground facts in Prolog notation)

Systems used RIBL, LVQ (OLVQl), IBL, IEL-IG

References [1]

Pointers mathias.kirsten@gmd.de

References

Wolters, Maria and Kirsten, Mathias (1999): Exploring the Use of Linguistic Features in Domain and Genre Classification. in: Proceedings of the Meeting of the European Chapter of the Association for Computational Linguistics, Bergen, Norway. Available online at http://www.ikp.uni-bonn.de/~mwo/publik.html

back to index

Application domain	Text categorisation by content and genre
Source	LIMAS corpus of contemporary German
Dataset size	500 documents with 2000 words each, up to 75,000 facts per run
-Data format	RIBL (Sets of ground facts in Prolog notation)
Systems used	RIBL, LVQ (OLVQl), IBL, IEL-IG
References	[1]
Pointers	mathias.kirsten@gmd.de