LLL (FraCaS) Dataset and Inductive Chart Parsing (York)
| Application domain:
||Natural Language Processing
||The LLL (FraCaS) dataset
| Dataset size:
||500 pairs (sentence,QLF)
| Data format:
| Systems used:
LLL (FraCaS) Dataset
The FraCaS grammar is an attribute-value grammar able to produce the
semantic interpretations of the covered sentences in the form of quasi
logical forms (QLFs) (Cooper et al. 1996). The paper
(Kazakov et al.) specifies a setting for learning of this kind
of grammar starting from a partial grammar and a number of positive
examples <sentence,qlf> and introduces a method for the
evaluation of the results. A dataset consisting of 500 such pairs, and
a subset of the FraCaS grammar rules used in the generation of these
pairs, has been made available to the ILP-2 consortium. This dataset
is also known as the LLL dataset. Also a part of the distribution
were a chart parser, and a simple bottom-up parser represented as a
pure logic program, i.e. in which no negation or cut (!)
operators were used.
Inductive Chart Parsing
We use Inductive Logic Programming (ILP) within a chart-parsing
framework for grammar learning (Cussens and Pulman 1999). Given an
existing grammar G, together with some sentences which G
parse, we use ILP to find the ``missing'' grammar rules or lexical
items. Our aim is to exploit the inductive capabilities of chart
parsing, i.e. the ability to efficiently determine what is needed
for a parse. For each unparsable sentence, we find actual edges and
needed edges: those which are needed to allow a parse. The
former are used as background knowledge for the ILP algorithm
(P-Progol) and the latter are used as examples for the ILP
algorithm. We demonstrate our approach with a number of experiments
using a context-free grammar and a feature grammar.
In these experiments we used the following datasets, all of which were
provided by the second author:
- CFG with 5 grammar rules, 102 lexical items, 500 training
- CFG with 10 grammar rules, 122 lexical items, 500 training
- the LLL dataset (Kazakov et al.).
R. Cooper, Dick Crouch, Jan van Eijck, Chris Fox, Josef van Genabith, Jan
Jaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, and Steve
A strategy for building a framework for computational semantics (the
Technical report, The FraCaS Consortium, 1996.
James Cussens and Steve Pulman.
Experiments in inductive chart parsing.
Bled, Slovenia, 1999.
Submitted to the Learning Language in Logic (LLL) Workshop.
Dimitar Kazakov, Stephen Pulman, and Stephen Muggleton.
The FraCaS dataset and the LLL challenge.
back to index