Thursday
Jul012010

Machine translation - Will it Blend?

Anyone who has seen the eye-watering iPad destruction on the geek site Will it Blend? (link at the end of this article) will know how destructive blending something can be.

The concept of mixing or 'blending' machine and human translation, ostensibly to save on costs, is not a new one, but it hasn’t exactly taken off either. While the concept seems to make sense, the practice, for the most part, does not.

So why can't you just 'repair' a machine translation?

We set out to discover if editing or 'brushing up' a machine translation could save any time at all (or money) and to find out if the results would be any good.

We chose a short, straight-forward English source text of a general-nature that included some technical terms. Specifically, an article on sports for youngsters with heart disease. 

We pre-translated the text into German using Google. The results at first glace seemed promising. Single medical terms for instance were quite good, but then, any conscientious translator would double-check these terms so this is not such a great time-saver in reality.

Going through the Google-translated sentences, however, it quickly became clear that virtually none of the sentences made sense. In fact, in many cases they were quite unintelligible because the software is incapable of making logical connections between the individual words.

To cite just one example, a heading 'Sport matters' was rendered as 'Sport affairs' (in German) as Google hadn't recognised that 'matters' was a verb. While it was possible to glean some of the meaning from the machine translation, there were no three words in a row that could be used in a professional context. The main reason being that machines consistently fail to correctly identify the word types (verbs, nouns, adjectives etc.). 

Our translator found that the strange 'German' results were actually harder to work with, the frequent contextual errors caused confusion and when less incorrect parts did appear, the temptation was to work around them instead of revising the structure as a whole.

This tendency to work around the 'less bad' machine results had the inevitable effect of producing a translation that was mediocre at best. If our translator had worked from the English source text instead of a machine translation, the translation produced would have been more natural sounding and a lot easier to read.

Most worryingly, the amount of time and effort put into creating this mediocre blended translation was equal to if not greater than that of producing a translation in the normal way.

In short, we believe that attempting to blend machine translation with human translation to save money is a false economy. A lot of work can go into producing mediocre results. If less effort is put in, then the copy is basically unpublishable nonsense.

So what you save in translation costs, you will spend in endless editing and proofreading. 

Machine translation, will it blend? No, but it will definitely shred...

Those who want to know if an iPad will blend, click here


 

Thursday
Jul012010

Yahoo vs Google

There are two main technologies behind machine translation, Yahoo's Babelfish (Systran) and Google Translate.

Systran is rule-based whereas Google uses a statistical model.

Rule based - attempts to create a translation using a set of rules that define essentially how words should be substituted to create translation.

Statistical - compares large volumes of translated copy, a multilingual database of previously translated documents and tries to map sentences and words based on matches it finds in the database.

Both systems are fundamentally limited by the fact that machines cannot yet understand the copy they are translating. With context and meaning out of the window, all that can be achieved is a skeletal meaning, stripped of sense and coherence, that is often more amusing than useful.

Below are some examples:

Original French:

Dès qu'il fut dehors, Pierre se dirigea vers la rue de Paris, la principale rue du Havre, éclairée, animée, bruyante. L'air un peu frais des bords de mer lui caressait la figure, et il marchait lentement, la canne sous le bras, les mains derrière le dos. Il se sentait mal à l'aise, alourdi, mécontent comme lorsqu'on a reçu quelque fâcheuse nouvelle. Aucune pensée précise ne l'affligeait et il n'aurait su dire tout d'abord d'où lui venaient cette pesanteur de l'âme et cet engourdissement du corps. Il avait mal quelque part, sans savoir où. ; il portait en lui un petit point douloureux, une de ces presque insensibles meurtrissures dont on ne trouve pas la place, mais qui gênent, fatiguent, attristent, irritent, une souffrance inconnue et légère, quelque chose comme une graine de chagrin.

English (human translation):

As soon as he got out, Pierre made his way to the Rue de Paris, the high street of Havre, brightly lit, bustling and noisy. The sharp coastal air braced his figure, he walked slowly, his stick under his arm and his hands behind his back. He was ill at ease, with heavy heart, as one is after hearing unpleasant tidings. He was not distressed by any definite thought and he would have been puzzled to account, at any given moment, for this dejection of spirit and heaviness of limb. He was hurt somewhere, unable to pin-point his malaise. Lingering within him there was a twist of pain, one of those almost imperceptible wounds which we cannot lay a finger on, it interferes, irritates, drains us, depresses us, this small seed of sadness.

And here are the translated versions (from French):

Yahoo's Babel Fish:

As soon as it was outside, Pierre moved towards the street of Paris, the principal street of Le Havre, lit, animated, noisy. The a little fresh air of the edges of sea cherished the figure to him, and it went slowly, the cane under the arm, the hands behind the back. It was smelled badly at ease, weighed down, dissatisfied like when one received some annoying news. No thought specifies did not afflict it and it would not have known to say first of all from where this gravity came to him from the heart and this numbness from the body. It had some share badly, without knowing where; it carried in him a small painful point, one of these almost insensitive bruises which one does not find the place, but which obstructs, tire, sadden, irritate, an unknown and light suffering, something like a seed of sorrow.

Google Translate:

When he was outside, Pierre went to the streets of Paris, the main street of Le Havre, lighted, busy, busy. The air of a little fresh seashores figure caressed him, and he walked slowly, cane under his arm, hands behind their backs. He felt uncomfortable weight, as unhappy when received some disturbing news. No precise thought it sad and knew he would say first of all that this was due gravity of the soul and the numbness of the body. He had badly somewhere without knowing where; He wore it a little painful point, one of these almost insensitive bruises which are not found instead, but the way, tired, sadden, irritate, suffering and unknown slight, some something like a seed of sorrow.

Another example in German

In diese Verlegenheit gerät sie ohne ihre Schuld. Sie fängt von Grundsätzen an, deren Gebrauch im Laufe der Erfahrung unvermeidlich und zugleich durch diese hinreichend bewährt ist. Mit diesem steigt sie (wie es auch ihre Natur mit sich bringt) immer höher, zu entfernteren Bedingungen. Da sie aber gewahr wird, daß auf diese Art ihr Geschäft jederzeit unvollendet bleiben müsse, weil die Fragen niemals aufhören, so sieht sie sich genötigt, zu Grundsätzen ihre Zuflucht zu nehmen, die allen möglichen Erfahrungsgebrauch überschreiten und gleichwohl so unverdächtig scheinen, daß auch die gemeine Menschenvernunft damit im Einverständnisse steht. Dadurch aber stürzt sie sich in Dunkelheit und Widersprüche, aus welchen sie zwar abnehmen kann, daß irgendwo verborgene Irrtümer zum Grunde liegen müssen, die sie aber nicht entdecken kann, weil die Grundsätze, deren die sich bedient, da sie über die Grenze aller Erfahrung hinausgehen, keinen Probierstein der Erfahrung mehr anerkennen. Der Kampfplatz dieser endlosen Streitigkeiten heißt nun Metaphysik.

English (human translation):

It falls into this difficulty without any fault of its own. It begins with principles, which cannot be dispensed with, in the field of experience, and the truth and sufficiency of which are, at the same time, insured by experience. With these principles it rises, in obedience to the laws of its own nature, to ever higher and more remote conditions. But it quickly discovers that, in this way, its labours must remain ever incomplete, because new questions never cease to present themselves; and thus it finds itself compelled to have recourse to principles which transcend the region of experience, while they are regarded by common sense without distrust. It therefore falls into confusion and contradictions, from which it conjectures the presence of latent errors, which, however, it is unable to discover, because the principles it employs, transcending the limits of experience, cannot be tested by that criterion. The arena of these endless contests is called Metaphysics.

Yahoo's Babel Fish:

Into this embarrassment it comes without its debt. It begins from principles, whose use is sufficiently proven in the course of the experience inevitably and at the same time by these. With this it (like it also its nature with itself brings) rises ever more highly, on more distant conditions. Since it becomes however more aware of the fact that in this kind its business must remain at any time unfinished, because the questions never stop, then them see themselves forced to take to principles their refuge which exceeds all possible experience use and however seems so unsuspected that also the common people reason thereby stands in agreements. Thus however falls it itself in darkness and contradictions, from which it to decrease can the fact that somewhere hidden mistakes to reason to lie must, which it however not to discover can, because the principles, whose avails itself, since they go beyond the border of all experience, no probierstein of the experience more recognizes. The combat place of these endless disputes is called now Metaphysik.

Google Translate:

In this embarrassment desire to be without their debt. It starts from basic principles to their use during the inevitable and experience at the same time by this sufficiently proven. With this rising (as it is their nature brings with it) are always higher, too distant conditions. Because they will be aware that in this way their business must always remain unfinished because the questions never stop, so she has to turn to their principles to take refuge for all kinds of experience and use exceed nevertheless so unsuspicious seem that the common sense so that people in consents. This plunges but they are in darkness and contradictions from which they can lose weight while that hidden somewhere on the underlying mistakes, but they can not discover, because the principles, which serves its because they have the experience beyond all limits no touchstone of the experience more recognition. The court martial of these endless disputes is now metaphysics.

It is safe to say that machine translation cannot compete with human translation. Both Google and Yahoo get a lot wrong and have some strange output ("It was smelled badly at ease?"), and each get some things surprisingly close. Neither would be suitable for more than trying to gain a quick, cursory understanding of something written in an unfamiliar language.

Translation is without doubt best left to professional translators who are native speakers and who can apply their experience, wit and intelligence to carry the true meaning of the copy from one language to another.

Thursday
Jul012010

Google-de-gook

 

Google have been offering free web-based machine translation for a number of years, essentially the service is for people to translate small pieces of text so that they can quickly get a basic understanding of what is being said in another language. Free machine translation is nothing new. Before Google got in on the act, Babel Fish was one of the most popular.

However, there is a fundamental difference between the technology behind Google Translate and Babel Fish.

While Babel Fish uses a rule-based system, i.e. formulas that dictate how words in one language should be replaced by words from another language, Google uses a statistical system that harvests many millions of words from existing translated material and produces translation through a system of statistical analysis.

This is an interesting approach, even though it raises a few questions in terms of the intellectual property rights of the original translators and / or their clients, but then again the 'public domain' has a lot to answer for in this regard.

Those issues aside, is it any good? Should translators and agencies alike be shaking in their boots and reconsidering their career options?

We don't think so. The idea of machines (or software) producing smooth, well written translation is still pure fantasy. Human translation is not under any serious threat until artificial intelligence becomes a part of our everyday lives. When machines become self-aware and can actually comprehend what it is they are being asked to translate (let's face it, if a machine is self-aware, it is only polite to ask), only then there will be even an outside chance for some useable computer generated translation.

Assuming, of course, that an intelligent self-aware computer wouldn't decide it had better things to do than translate stuff, it may opt instead for taking over the world, or making all the telephones on the planet ring at the same time, or something geeky like that.

How does Google's statistics-based system compare with Yahoo's rule-based Babel Fish? Find out here.



Thursday
May012008

Is translation going down the pan?

We went undercover posing as a European travel co. to test a few of these low-cost translation providers, we can show, by example, that low quality translation is not worth the peanuts that were paid for it.

Click to read more ...

Monday
Apr212008

So is Nerdic really the new lingua franca?

‘Nerdic’ is apparently the fastest growing language in Europe . . .

Click to read more ...