General introduction to mechine translation
Posted in COMPUTATIONAL LINGUISTICSThe mechanization of translation has been one of humanity’s oldest dreams. In
the twentieth century it has become a reality, in the form of computer programs
capable of translating a wide variety of texts from one natural language into
another. But, as ever, reality is not perfect. There are no ‘translating machines’
which, at the touch of a few buttons, can take any text in any language and
produce a perfect translation in any other language without human intervention
or assistance. That is an ideal for the distant future, if it is even achievable in
principle, which many doubt.
What has been achieved is the development of programs which can produce
‘raw’ translations of texts in relatively well-defined subject domains, which can
be revised to give good-quality translated texts at an economically viable rate or
which in their unedited state can be read and understood by specialists in the
subject for information purposes. In some cases, with appropriate controls on the
language of the input texts, translations can be produced automatically that are of
higher quality needing little or no revision.
These are solid achievements by what is now traditionally called Machine
Translation (henceforth in this book, MT), but they have often been obscured
and misunderstood. The public perception of MT is distorted by two extreme
positions. On the one hand, there are those who are unconvinced that there is
anything difficult about analysing language, since even young children are able to
learn languages so easily; and who are convinced that anyone who knows a foreign
language must be able to translate with ease. Hence, they are unable to appreciate
the difficulties of the task or how much has been achieved. On the other hand,
there are those who believe that because automatic translation of Shakespeare,
Goethe, Tolstoy and lesser literary authors is not feasible there is no role for any
kind of computer-based translation. They are unable to evaluate the contribution
which less than perfect translation could make either in their own work or in the
general improvement of international communication. Most translation in the world is not of texts which have high literary and
cultural status. The great majority of professional translators are employed to
satisfy the huge and growing demand for translations of scientific and technical
documents, commercial and business transactions, administrative memoranda, legal
documentation, instruction manuals, agricultural and medical text books, industrial
patents, publicity leaflets, newspaper reports, etc. Some of this work is challenging
and difficult. But much of it is tedious and repetitive, while at the same time
requiring accuracy and consistency. The demand for such translations is increasing
at a rate far beyond the capacity of the translation profession. The assistance of
a computer has clear and immediate attractions. The practical usefulness of an
MT system is determined ultimately by the quality of its output. But what counts
as a ‘good’ translation, whether produced by human or machine, is an extremely
difficult concept to define precisely. Much depends on the particular circumstances
in which it is made and the particular recipient for whom it is intended. Fidelity,
accuracy, intelligibility, appropriate style and register are all criteria which can be
applied, but they remain subjective judgements. What matters in practice, as far as
MT is concerned, is how much has to be changed in order to bring output up to a
standard acceptable to a human translator or reader. With such a slippery concept
as translation, researchers and developers of MT systems can ultimately aspire
only to producing translations which are ‘useful’ in particular situations — which
obliges them to define clear research objectives — or, alternatively, they seek
suitable applications of the ‘translations’ which in fact they are able to produce.
Nevertheless, there remains the higher ideal of equalling the best human
translation. MT is part of a wider sphere of ‘pure research’ in computerbased
natural language processing in Computational Linguistics and Artificial
Intelligence, which explore the basic mechanisms of language and mind by
modelling and simulation in computer programs. Research on MT is closely
related to these efforts, adopting and applying both theoretical perspectives and
operational techniques to translation processes, and in turn offering insights and
solutions from its particular problems. In addition, MT can provide a ‘test-bed’ on
a larger scale for theories and techniques developed by small-scale experiments in
computational linguistics and artificial intelligence.
The major obstacles to translating by computer are, as they have always been,
not computational but linguistic. They are the problems of lexical ambiguity, of
syntactic complexity, of vocabulary differences between languages, of elliptical and
‘ungrammatical’ constructions, of, in brief, extracting the ‘meaning’ of sentences
General introduction
____________and brief history
and texts from analysis of written signs and producing sentences and texts in
another set of linguistic symbols with an equivalent meaning. Consequently, MT
should expect to rely heavily on advances in linguistic research, particularly those
branches exhibiting high degrees of formalization, and indeed it has and will
continue to do so. But MT cannot apply linguistic theories directly: linguists
are concerned with explanations of the underlying ‘mechanisms’ of language
production and comprehension, they concentrate on crucial features and do not
attempt to describe or explain everything. MT systems, by contrast, must deal
with actual texts. They must confront the full range of linguistic phenomena, the
complexities of terminology, misspellings, neologisms, aspects of ‘performance’
which are not always the concern of abstract theoretical linguistics.
In brief, MT is not in itself an independent field of 'pure' research. It takes
from linguistics, computer science, artificial intelligence, translation theory, any
ideas, methods and techniques which may serve the development of improved
systems. It is essentially ‘applied’ research, but a field which nevertheless has
built up a substantial body of techniques and concepts which can, in turn, be
applied in other areas of computer-based language processing.
0 comments: