Thursday, 1 September 2011

0

Inductive Dependency Parsing in natural language processing

Posted in
1.1 Inductive Dependency Parsing
In the framework of inductive dependency parsing, the syntactic analysis of a
sentence amounts to the derivation of a dependency structure, using inductive
machine learning to guide the parser at nondeterministic choice points. This
methodology combines a number of themes that are prominent in the recent
natural language processing literature, although the particular combination
of ideas embodied in the resulting framework appears to be original. More
precisely, inductive dependency parsing can be regarded as the simultaneous
instantiation of two notions that have played a more or less central role in
natural language parsing in recent years:
• Dependency-based parsing
• Data-driven parsing
The fundamental idea in dependency-based parsing is that parsing crucially
involves establishing binary relations between words in a sentence. This is
illustrated in figure 1.1, which depicts the analysis of a short sentence taken
from the Wall Street Journal section of the Penn Treebank (Marcus et al.,
1993, 1994). In this example, the syntactic structure is built up by recognizing
a subject relation (sbj) from the finite verb had to the noun news, a nominal
modifier relation (nmod) from news to the adjective Economic, an object
relation (obj) from had to the noun effect, and so on.
Dependency-based methods appear in many guises in the current literature
on natural language parsing. On the one hand, we have what we may
call dependency parsing in a narrow sense, where the goal of the parsing
process is to build a dependency structure, i.e., a graph built from binary
dependency relations as in figure 1.1, and where the analysis more or less
closely adheres to the theoretical tradition of dependency grammar. Cases in
point are Hellwig (1980), Maruyama (1990), Harper and Helzerman (1995),
Tapanainen and J¨arvinen (1997), Menzel and Schr¨oder (1998), and Duchier
(1999). On the other hand, we have approaches that can be characterized as
dependency-based parsing in a broader sense, where the syntactic analysis
may not take the form of a dependency structure, but where the construction
of the analysis nevertheless depends on finding syntactic relations between
lexical heads. In this category, we find the widely used link grammar parser
for English (Sleator and Temperley, 1993), as well as the influential probabilistic
parsers of Collins (1997, 1999) and Charniak (2000), but also a variety
of other lexicalized parsing models that can be subsumed under the general
notion of bilexical grammars (Eisner, 2000). The use of bilexical relations for
disambiguation has been a significant theme in research on natural language
parsing during the last decade, although the results are not completely unambiguous
(Collins, 1999; Gildea, 2001; Klein and Manning, 2003; Bikel, 2004).
The framework we develop in this book falls under the more narrow definition
of dependency parsing, at least in the sense that it assumes dependency
structures as the only form of syntactic representation. At the same time,
we will focus more on formal methods for constructing dependency structures
than on details of linguistic analysis, which means that the discussion
1.1 Inductive Dependency Parsing 3will remain rather agnostic with respect to different theoretical traditions of
dependency grammar. More precisely, we will define the task of dependency
parsing relative to a formal framework of dependency graphs, where only minimal
assumptions are made concerning the linguistic analysis, but where the
notions of robustness, disambiguation, accuracy and efficiency can be given
precise definitions.
Although our formal characterization of dependency parsing is compatible
with different parsing strategies, we will limit our attention in this study to
deterministic methods, which means that we derive a single analysis for each
input sentence in a monotonic fashion with no redundancy or backtracking.
Historically, deterministic parsing of natural language has been investigated
with a view to modeling human sentence processing in a psychologically plausible
way, as in the work by Marcus (1980) and Shieber (1983), but it has also
been explored as a way of improving the robustness and efficiency of natural
language parsing, especially in various approaches to partial parsing using
finite-state methods (Ejerhed, 1983; Koskenniemi, 1990; Abney, 1991, 1996).
In the present framework, we want to apply deterministic methods to full
parsing, insofar as we want to derive a complete dependency structure for
each input sentence. In this way, we hope to combine the gains in efficiency
with a deeper analysis of syntactic structure. The parsing algorithm that we
use was first presented in Nivre (2003), with a partial analysis of its complexity
and robustness properties. It has also been shown that the algorithm
favors incremental processing, something that may be desirable both for certain
applications, such as language modeling for speech recognition, and for
the kind of psycholinguistic modeling that inspired early research on deterministic
parsing (Nivre, 2004a). In this book, we will for the first time provide
a comprehensive analysis of the parsing algorithm with respect to robustness,
disambiguation and complexity.
The second essential component of our methodology is a commitment to
data-driven parsing, understood in a broad sense to include all approaches
that make essential use of empirical data, in particular treebanks or parsed
corpora (Abeill´e, 2003b; Nivre, forthcoming), in the development of parsing
systems for natural language. Research during the last ten to fifteen years has
shown rather conclusively that an empirical approach is necessary in order to
achieve accurate disambiguation as well as robustness in parsing unrestricted
text, regardless of whether the parser uses a traditional grammar or a more
radically data-driven model. In the former case, exemplified by broad-coverage
deep parsers such as Riezler et al. (2002) and Toutanova et al. (2002), treebank
data are used to tune and optimize the parser, in particular by constructing a
statistical model for parse selection. In the latter case, represented by probabilistic
parsers such as Collins (1997, 1999) and Charniak (2000), the grammar
is replaced by a statistical model, the parameters of which are derived from
treebank data using machine learning techniques.
An even more radical approach is to replace the statistical model by the
treebank itself, and to reuse fragments of previously encountered syntactic
4 1 Introduction
structures to construct new ones during parsing, as in the framework of Data-
Oriented Parsing (DOP) (Bod, 1995, 1998; Bod, Scha and Sima’an, 2003).
The DOP model can be seen as an instantiation of the paradigm of memorybased
learning, or lazy learning, which is based on the idea that learning is the
simple storage of experiences in memory and that new problems are solved by
reusing solutions from similar old problems (Daelemans, 1999; Daelemans and
Van den Bosch, 2005). Memory-based learning has been successfully applied to
a wide variety of problems in natural language processing, such as graphemeto-
phoneme conversion, part-of-speech tagging, prepositional phrase attachment,
and chunking (Daelemans et al., 2002). Memory-based approaches to
syntactic parsing, in addition to the DOP framework, include Veenstra and
Daelemans (2000), Buchholz (2002), De Pauw (2003) and K¨ubler (2004).
In this book, we will explore a memory-based approach to dependency
parsing, using classifiers that predict the next parsing action based on the
current state of the parser and a database of previously encountered parser
states. Since the state of the parser results from a sequence of previous actions,
this can also be seen as a form of history-based parsing (Black et al., 1992;
Jelinek et al., 1994; Magerman, 1995), although we prefer the term inductive
dependency parsing for the general idea of using inductive machine learning
to predict the actions of a dependency parser.
An early version of this idea, with a simple probabilistic classifier, was
reported in Nivre (2004b). The memory-based version was first presented in
Nivre et al. (2004), with an evaluation on Swedish treebank data, and later in
Nivre and Scholz (2004), with results from the Wall Street Journal section of
the Penn Treebank. This book provides a comprehensive analysis of inductive
dependency parsing, including a general characterization of the history-based
model and a formal framework for the specification of model parameters.
For the deterministic memory-based instantiation of this framework, we also
present a detailed discussion of feature selection, and a thorough empirical
evaluation of different models using treebank data from both Swedish and
English that goes far beyond previously published results.
The framework developed in this book is implemented in a system called
MaltParser, which is used in all the experiments reported below. MaltParser
can be described as a language-independent parser-generator. When applied
to a dependency-based treebank, the system generates a dependency parser
for the language represented in the treebank. The memory-based version of
this system uses the TiMBL software package (Daelemans et al., 2004) and
supports a variety of options with respect to linguistic features as well as
learning parameters. A version of MaltParser is freely available for research
and educational purposes.

0 comments: