It is January 7, 1954. A select group assembles at the offices of the International Business Machines Corp to witness the first public demonstration of machine translation. The IBM 701, the company’s first commercial scientific computer, filled a whole room.
With a memory capacity of about 20 kilobytes and the ability to carry out 2000 multiplications a second, the 701 was about one million times less powerful than present-day personal computers. Nevertheless, the event, later known as the Georgetown-IBM experiment in a nod to its academic co-sponsor, made history.
The mind behind the event was Léon Dostert, cofounder of Georgetown University’s Institute of Languages and Linguistics and one of the most significant linguists of his day. Dostert had been a personal interpreter for US General (and later President) Dwight D. Eisenhower during World War II, and he organized the system of simultaneous translation used at the Nuremberg Tribunal. Dostert had also been a long-time friend of IBM’s founder Thomas J. Watson.
The approach to machine translation (MT), pioneered by Dostert and others in the 1950s, is referred to as “rule-based”: Linguists draw up sets of rules governing the functioning of the languages concerned – rules that can be expressed in mathematical form.
When suitably programmed, the computer should be able to analyze an input text and transform it, using appropriate bilingual dictionaries and transformation rules, into a grammatically correct phrase in the target language. The output text should express the same meaning as the inputted text, to the extent possible in the fluent natural style of the target language.
A comprehensive solution would clearly require a very large number of rules and other data, and computational capacities far beyond any existing at the time. To cut down on the requirements Dostert chose sentences each having a rather simple structure, a minimum of ambiguity and a relatively restricted vocabulary. Most dealt with chemistry and were written in “scientific Russian.”
After a great deal of effort Dostert and colleagues succeeded in formulating six basic rules of “operational syntax” and selecting a vocabulary of 250 Russian words, sufficient for the demonstration. The rules were written into the computer program, together with the equivalent of a 250-word Russia-English dictionary.
Prior to the computer run, Dostert carried out a test with humans. He described it thusly in his 1954 report, “An Experiment in Mechanical Translation: Aspects of the General Problem”:
This involved giving to individuals who did not know the source language, Russian, sentences in that language written in Romanized script. They were directed in writing to go through a look-up, not only of lexical items but of the syntactic manipulations as well. The look-up was based on instructions reduced to strictly mechanical terms rather than ‘thinking’ operations… The subjects were able to take a sentence presented to them in Romanized Russian and to come up, by going through instructions a machine could follow, with a correct English rendition of the Russian sentences…. The significant fact is that, without knowing the Russian language, and, therefore, without contributing anything except their ability to look up, which is what the computer is capable of doing, they came out with the correct English version.
The Russian sentences, transliterated phonetically into the Roman alphabet, were coded onto punch cards and run through the machine. Before the eyes of the assembled witnesses, the computer translated 60 sentences from Russian into English, at a rate of one every six to seven seconds. This sensational accomplishment was widely covered in the press.
The Georgetown-IBM experiment gave enormous impetus to the early development of artificial intelligence, much of which was originally concentrated in the field of machine translation and funded by US defense institutions for that purpose. Drosten went on to become one of the leading advocates of machine translation.
In one respect the experiment was too successful, however.
Like Joseph Weizenbaum’s 1966 demonstration of ELIZA – a program that could carry out written dialog in English – Dostert’s feat gave rise to greatly exaggerated expectations. Especially the favorable choice of the Russian test sentences created illusions about the real power of the system. Disillusionment followed, culminating a decade later in the devastating ALPAC Report which led to a drastic cut in funding for MT and AI in general.
Indeed, rule-based machine translation proved to be much more difficult than originally expected. Efforts to solve it led in the 1950s and ’60s to important developments in linguistic theory, but not to viable machine translation systems.
The prediction by Y. Bar Hillel in 1960, that fully automatic high-quality translation would prove to be impossible seemed confirmed by developments during the subsequent 30 years.
In the meantime, however, the orders-of-magnitude increases in the speed, memory and data processing capabilities of modern computers have given new life to rule-based MT. An incomparably larger number of rules and other data on the source and target languages can be programmed in.
That includes general semantic relationships. For example, a terrier is a dog; a dog is an animal; an animal is a living organism. Add to that more complex relationships that overlap with the domain of facts about the real world, such as the relationships among “eat,” “food,” “hunger,” “taste,” “cooking,” “stomach,” “teeth,” “farming,” “restaurant,” and so on.
Elementary semantic relationships from everyday life are among the things AI pioneer John McCarthy had in mind when he urged, starting in the late 1950s, that AI systems must acquire a broad basis of “common sense knowledge,” as a precondition for attaining more human-like performance.
As I mentioned in Part 3 of this series, the heroic effort by Douglas Bruce Lenat to integrate common-sense knowledge into his AI system, Cyc, required programming more than 2 million facts and 24 million common-sense rules and assertions into the system. Many of these had to be identified and written individually by collaborators of Lenat’s group.
Fortunately, rule-based machine translation can work well without requiring the full range and breadth of “common sense.” It has proven its usefulness in a number of fields, but remains a relatively labor-intensive approach.
Once again I have the impression that AI has yet to produce more intelligence that has been invested into it. (Needless to say, this does not prevent AI from being indispensable and highly productive in other respects!)
On a fundamental level the rule-based approach to MT, however useful in applications, has hardly anything to do with how human beings actually acquire language and knowledge about the world.
Babies are not born with the grammar of their parents’ language programmed into their heads. Languages are learned. So is most of what we call common sense. Also, children become fluent in their first languages without having studied any grammar at all, and usually before they even know what grammar is. One might also doubt whether what we call common sense isn’t just a bag of facts and rules.
To the extent AI aspires to attain true human-like intelligence, the rule-based approach has an unnatural, ad hoc character.
In this context, it is interesting to note that the beginnings of machine translation overlap strongly with World War II cryptography – the successful effort by later-to-become-AI pioneer Alan Turing to break the secret codes used by Germany forces as well as the work of Claude Shannon, also a WWII cryptographer.
Warren Weaver, whose 1949 “Memorandum on Translation” effectively launched the MT effort, stated famously, “One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’”
But do human languages, in their actual use, really function like codes?
Enter machine learning
It is no surprise that the spectacular breakthroughs in MT in the last two decades have come from AI systems capable of acquiring capabilities through a kind of “learning.” These are systems that train themselves to translate, using large databases (corpuses) of paired original and human-translated phrases.
There is no need to input any information about the grammatical structure or semantics of the given languages, nor even a bilingual dictionary. These systems generate all the “knowledge” they require to become adept translators by working through the databases.
The results seem nothing short of miraculous. In certain measures of performance the newest systems appear to be approaching the performance of human translators – and even superhuman capabilities, as far as the speed of translation is concerned.
The evolution of these systems required an enormous amount of human effort and ingenuity. It has gone through two main stages: first so-called statistical machine translation (SMT), followed in recent years by neural machine translation (NMT) utilizing artificial neural networks and “deep learning.” Most recently there is a growing trend toward hybrid MT systems that combine NMT with a certain amount of “rule-based” processing.
Rather than attempt to describe SMT and NMT here, I shall just make one observation relevant to my thesis about the stupidity of artificial intelligence.
There is nothing surprising per se about the fact that computers with sufficient computational resources could acquire the capability to translate routine texts on the basis of a very large database of human translations.
In essence, SMT and NMT – as well as machine learning in general – are nothing more than extremely sophisticated forms of curve-fitting, applied to specific types of problems. In more mathematical language: interpolation and extrapolation from a set of data-points (the text pairs, suitably coded in digital form), by methods of statistical optimization.
“Learning” mostly amounts to an iterative procedure for determining the values of the parameters for the input-output function, which the computer uses to generate its translations. The goal is to obtain an input-output function that approximates the output of an ideal human translator when “fed” with an input text. I shall discuss this point more in the following installment.
The famous linguist Noam Chomsky referred to current MT methodology as a “brute force” approach.
I would ask: Is the sort of learning that is performed by present-day AI systems really human-like? Do these systems understand – in any meaningful sense – the texts they are translating?
The answer to the last question is, clearly, “No.” Ironically, that is exactly the fact that made Léon Dostert so optimistic about the prospects for machine translation: translation without understanding!
Jonathan Tennenbaum received his PhD in mathematics from the University of California in 1973 at age 22. Also a physicist, linguist and pianist, he’s a former editor of FUSION magazine. He lives in Berlin and travels frequently to Asia and elsewhere, consulting on economics, science and technology.