LLL (FraCaS) Dataset and Inductive Chart Parsing (York)

Application domain: Natural Language Processing
Source: The LLL (FraCaS) dataset
Dataset size: 500 pairs (sentence,QLF)
Data format: Prolog
Systems used: P-Progol
Pointers: jc@cs.york.ac.uk

The FraCaS grammar is an attribute-value grammar able to produce the semantic interpretations of the covered sentences in the form of quasi logical forms (QLFs) (Cooper et al. 1996). The paper (Kazakov et al.) specifies a setting for learning of this kind of grammar starting from a partial grammar and a number of positive examples <sentence,qlf> and introduces a method for the evaluation of the results. A dataset consisting of 500 such pairs, and a subset of the FraCaS grammar rules used in the generation of these pairs, has been made available to the ILP-2 consortium. This dataset is also known as the LLL dataset. Also a part of the distribution were a chart parser, and a simple bottom-up parser represented as a pure logic program, i.e. in which no negation or cut (!) operators were used.

We use Inductive Logic Programming (ILP) within a chart-parsing framework for grammar learning (Cussens and Pulman 1999). Given an existing grammar G, together with some sentences which G can not parse, we use ILP to find the ``missing'' grammar rules or lexical items. Our aim is to exploit the inductive capabilities of chart parsing, i.e. the ability to efficiently determine what is needed for a parse. For each unparsable sentence, we find actual edges and needed edges: those which are needed to allow a parse. The former are used as background knowledge for the ILP algorithm (P-Progol) and the latter are used as examples for the ILP algorithm. We demonstrate our approach with a number of experiments using a context-free grammar and a feature grammar.

