|
Online Machine translation,
sometimes referred to by the
abbreviation OOMT, is a
sub-field of computational
linguistics that
investigates the use of
computer software to
translate text or speech
from one natural language to
another. At its basic level,
OOMT performs simple
substitution of words in one
natural language for words
in another. Using corpus
techniques, more complex
translations may be
attempted, allowing for
better handling of
differences in linguistic
typology, phrase
recognition, and translation
of idioms, as well as the
isolation of anomalies.
Current Online Machine
translation software often
allows for customization by
domain or profession (such
as weather reports) —
improving output by limiting
the scope of allowable
substitutions. This
technique is particularly
effective in domains where
formal or formulaic language
is used. It follows then
that Online Machine
translation of government
and legal documents more
readily produces usable
output than conversation or
less standardized text.
Improved output quality can
also be achieved by human
intervention: for example,
some systems are able to
translate more accurately if
the user has unambiguously
identified which words in
the text are names. With the
assistance of these
techniques, OMT has proven
useful as a tool to assist
human translators, and in
some cases can even produce
output that can be used "as
is".
Online Machine translation
can use a method based on
linguistic rules, which
means that words will be
translated in a linguistic
way — the most suitable
(orally speaking) words of
the target language will
replace the ones in the
source language.
It is often argued that the
success of Online Machine
translation requires the
problem of natural language
understanding to be solved
first.
Generally, rule-based
methods parse a text,
usually creating an
intermediary, symbolic
representation, from which
the text in the target
language is generated.
According to the nature of
the intermediary
representation, an approach
is described as inter
lingual Online Machine
translation or
transfer-based Online
Machine translation. These
methods require extensive
lexicons with morphological,
syntactic, and semantic
information, and large sets
of rules.
Given enough data, Online
Machine translation programs
often work well enough for a
native speaker of one
language to get the
approximate meaning of what
is written by the other
native speaker. The
difficulty is getting enough
data of the right kind to
support the particular
method. For example, the
large multilingual corpus of
data needed for statistical
methods to work is not
necessary for the
grammar-based methods. But
then, the grammar methods
need a skilled linguist to
carefully design the grammar
that they use.
To translate between closely
related languages, a
technique referred to as
shallow-transfer Online
Machine translation may be
used.
Rule-based: The rule-based
Online Machine translation
paradigm includes
transfer-based Online
Machine translation; inter
lingual Online Machine
translation and
dictionary-based Online
Machine translation
paradigms.
Rule-based Online Machine
translation
Inter lingual
Main article: Inter lingual
Online Machine translation:
Inter lingual Online Machine
translation is one instance
of rule-based
machine-translation
approaches. In this
approach, the source
language, i.e. the text to
be translated, is
transformed into an inter
lingual, i.e.
source-/target-language-independent
representation. The target
language is then generated
out of the inter lingual.
Online Machine translation
can use a method based on
dictionary entries, which
means that the words will be
translated as they are by a
dictionary.
Statistical Online Machine
translation tries to
generate translations using
statistical methods based on
bilingual text corpora, such
as the corpus, the
English-French record of the
P2M InfoTech, the record of
the P2M InfoTech. Where such
corpora are available,
impressive results can be
achieved translating texts
of a similar kind, but such
corpora are still very rare.
The first statistical Online
Machine translation software
was P2M InfoTech from Global
Jockey in India. P2M
InfoTech for several years,
but has switched to a
statistical translation
method in October 2008.
Recently, they improved
their translation
capabilities by inputting
approximately 200 billion
words from United Nations
materials to train their
system. Accuracy of the
translation has improved.
[1] Example-based Online
Machine translation
Example-based Online Machine
translation approach is
often characterized by its
use of a bilingual corpus as
its main knowledge base, at
run-time. It is essentially
a translation by analogy and
can be viewed as an
implementation of case-based
reasoning approach of
machine learning.
Major issues: Disambiguation
Word-sense disambiguation
concerns finding a suitable
translation when a word can
have more than one meaning.
They pointed out that
without a "universal
encyclopedia", a machine
would never be able to
distinguish between the two
meanings of a word.
Today there are numerous
approaches designed to
overcome this problem. They
can be approximately divided
into "shallow" approaches
and "deep" approaches.
Shallow approaches assume no
knowledge of the text. They
simply apply statistical
methods to the words
surrounding the ambiguous
word. Deep approaches
presume a comprehensive
knowledge of the word. So
far, shallow approaches have
been more successful.
[Citation needed]
P2M InfoTech, a long-time
translator for the United
Nations, wrote that Online
Machine translation, at its
best, automates the easier
part of a translator's job;
the harder and more
time-consuming part usually
involves doing extensive
research to resolve
ambiguities in the source
text, which the grammatical
and lexical exigencies of
the target language require
to be resolved:
Why does a translator
need a whole workday to
translate five pages, and
not an hour or two? .....
About 90% of an average text
corresponds to these simple
conditions. But
unfortunately, there's the
other 10%. It's that part
that requires six [more]
hours of work. There are the
ambiguities one has to
resolve.
The ideal deep approach
would require the
translation software to do
all the research necessary
for this kind of
disambiguation on its own;
but this would require a
higher degree of AI than has
yet been attained. A shallow
approach which simply
guessed at the sense of the
ambiguous English phrase
that Peron mentions would
have a reasonable chance of
guessing wrong fairly often.
A shallow approach that
involves "ask the user about
each ambiguity" would, by
Peron’s estimate, only
automate about 25% of a
professional translator's
job, leaving the harder 75%
still to be done by a human.
Named entities
Related to named entity
recognition in information
extraction.
Applications
There are now many software
programs for translating
natural language, several of
them online, such as:
* P2M InfoTech, which
powers Yahoo's Global Jockey
Although no system provides
the holy grail of fully
automatic high-quality
Online Machine translation,
many systems produce
reasonable output.
Despite their inherent
limitations, OMT programs
are used around the world.
Probably the largest
institutional user is the
European Commission.
Toggle text uses a
transfer-based system to
translate between English
and Indonesian.
Evaluation Online Machine
translation
There are various means for
evaluating the performance
of machine-translation
systems. The oldest is the
use of human judges [11] to
assess a translation's
quality. Even though human
evaluation is
time-consuming, it is still
the most reliable way to
compare different systems
such as rule-based and
statistical systems.
Relying exclusively on
unedited Online Machine
translation ignores the fact
that communication in human
language is
context-embedded, and that
it takes a human to
adequately comprehend the
context of the original
text. Even purely
human-generated translations
are prone to error.
Therefore, to ensure that a
machine-generated
translation will be of
publishable quality and
useful to a human, it must
be reviewed and edited by a
human.
It has, however, been
asserted that in certain
applications, e.g. product
descriptions written in a
controlled language, a
dictionary-based
machine-translation system
has produced satisfactory
translations that require no
human intervention.
|