LLL (FraCaS) Dataset and Inductive Chart Parsing (York)

Application domain: Natural Language Processing

Source: The LLL (FraCaS) dataset

Dataset size: 500 pairs (sentence,QLF)

Data format: Prolog

Systems used: P-Progol

Pointers: jc@cs.york.ac.uk

LLL (FraCaS) Dataset

The FraCaS grammar is an attribute-value grammar able to produce the semantic interpretations of the covered sentences in the form of quasi logical forms (QLFs) (Cooper et al. 1996). The paper (Kazakov et al.) specifies a setting for learning of this kind of grammar starting from a partial grammar and a number of positive examples <sentence,qlf> and introduces a method for the evaluation of the results. A dataset consisting of 500 such pairs, and a subset of the FraCaS grammar rules used in the generation of these pairs, has been made available to the ILP-2 consortium. This dataset is also known as the LLL dataset. Also a part of the distribution were a chart parser, and a simple bottom-up parser represented as a pure logic program, i.e. in which no negation or cut (!) operators were used.

Inductive Chart Parsing

We use Inductive Logic Programming (ILP) within a chart-parsing framework for grammar learning (Cussens and Pulman 1999). Given an existing grammar G, together with some sentences which G can not parse, we use ILP to find the ``missing'' grammar rules or lexical items. Our aim is to exploit the inductive capabilities of chart parsing, i.e. the ability to efficiently determine what is needed for a parse. For each unparsable sentence, we find actual edges and needed edges: those which are needed to allow a parse. The former are used as background knowledge for the ILP algorithm (P-Progol) and the latter are used as examples for the ILP algorithm. We demonstrate our approach with a number of experiments using a context-free grammar and a feature grammar.

In these experiments we used the following datasets, all of which were provided by the second author:

CFG with 5 grammar rules, 102 lexical items, 500 training sentences
CFG with 10 grammar rules, 122 lexical items, 500 training sentences
the LLL dataset (Kazakov et al.).

Bibliography

R. Cooper, Dick Crouch, Jan van Eijck, Chris Fox, Josef van Genabith, Jan Jaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, and Steve Pulman. A strategy for building a framework for computational semantics (the way forward). Technical report, The FraCaS Consortium, 1996.

James Cussens and Steve Pulman. Experiments in inductive chart parsing. Bled, Slovenia, 1999. Submitted to the Learning Language in Logic (LLL) Workshop.

Dimitar Kazakov, Stephen Pulman, and Stephen Muggleton. The FraCaS dataset and the LLL challenge. Unpublished.

back to index

Application domain:	Natural Language Processing
Source:	The LLL (FraCaS) dataset
Dataset size:	500 pairs (sentence,QLF)
Data format:	Prolog
Systems used:	P-Progol
Pointers:	`jc@cs.york.ac.uk`