LLL (FraCaS) Dataset and Inductive Chart Parsing (York)
Application domain: |
Natural Language Processing |
Source: |
The LLL (FraCaS) dataset |
Dataset size: |
500 pairs (sentence,QLF) |
Data format: |
Prolog |
Systems used: |
P-Progol |
Pointers: |
jc@cs.york.ac.uk |
LLL (FraCaS) Dataset
The FraCaS grammar is an attribute-value grammar able to produce the
semantic interpretations of the covered sentences in the form of quasi
logical forms (QLFs) (Cooper et al. 1996). The paper
(Kazakov et al.) specifies a setting for learning of this kind
of grammar starting from a partial grammar and a number of positive
examples <sentence,qlf> and introduces a method for the
evaluation of the results. A dataset consisting of 500 such pairs, and
a subset of the FraCaS grammar rules used in the generation of these
pairs, has been made available to the ILP-2 consortium. This dataset
is also known as the LLL dataset. Also a part of the distribution
were a chart parser, and a simple bottom-up parser represented as a
pure logic program, i.e. in which no negation or cut (!)
operators were used.
Inductive Chart Parsing
We use Inductive Logic Programming (ILP) within a chart-parsing
framework for grammar learning (Cussens and Pulman 1999). Given an
existing grammar G, together with some sentences which G
can not
parse, we use ILP to find the ``missing'' grammar rules or lexical
items. Our aim is to exploit the inductive capabilities of chart
parsing, i.e. the ability to efficiently determine what is needed
for a parse. For each unparsable sentence, we find actual edges and
needed edges: those which are needed to allow a parse. The
former are used as background knowledge for the ILP algorithm
(P-Progol) and the latter are used as examples for the ILP
algorithm. We demonstrate our approach with a number of experiments
using a context-free grammar and a feature grammar.
In these experiments we used the following datasets, all of which were
provided by the second author:
- CFG with 5 grammar rules, 102 lexical items, 500 training
sentences
- CFG with 10 grammar rules, 122 lexical items, 500 training
sentences
- the LLL dataset (Kazakov et al.).
Bibliography
-
R. Cooper, Dick Crouch, Jan van Eijck, Chris Fox, Josef van Genabith, Jan
Jaspars, Hans Kamp, David Milward, Manfred Pinkal, Massimo Poesio, and Steve
Pulman.
A strategy for building a framework for computational semantics (the
way forward).
Technical report, The FraCaS Consortium, 1996.
-
James Cussens and Steve Pulman.
Experiments in inductive chart parsing.
Bled, Slovenia, 1999.
Submitted to the Learning Language in Logic (LLL) Workshop.
-
Dimitar Kazakov, Stephen Pulman, and Stephen Muggleton.
The FraCaS dataset and the LLL challenge.
Unpublished.