Contemporary tendencies in development of machine translation from English into Ukrainian
To identify and analyze the most capable algorithms of machine translation in use today. To compare the results of translations made by online translators. To analyze typical stylistic, lexical and grammatical errors that appear in the translation.
Рубрика | Иностранные языки и языкознание |
Вид | статья |
Язык | английский |
Дата добавления | 03.12.2022 |
Размер файла | 38,5 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
Размещено на http://www.allbest.ru/
Oles Honchar Dnipro national university
Contemporary tendencies in development of machine translation from English into Ukrainian
O. Novikova Candidate of Philological Science, Associate professor of Department of translation and linguistic training of foreigners,
I. Suima Candidate of Philological Science, Associate professor, Associate professor of Department of translation and linguistic training of foreigners,
K. Shevchyk Candidate of Philological Science, Associate professor, Associate professor of Department of translation and linguistic training of foreigners,
Анотація
Постановка проблеми. Сьогодні машинний переклад є одним із чинників повсякденної діяльності людини. Машинний переклад може значно покращити глобальні комунікації, пришвидшивши процес перекладу, незважаючи на недосконалу якість вихідного тексту. Найчастіше результати онлайн-інструментів потребують постредагування й можуть ефективно використовуватися лише тими, хто певною мірою вже говорить мовою перекладу. Потреба в якісному перекладі зростає з кожним роком. Сьогодні пошук алгоритму для забезпечення такої якості перекладу є одним з найважливіших питань інформатики, кібернетики та лінгвістики, що доводить наукову новизну цієї роботи.
Мета статті - аналіз різних підходів до проєктування систем машинного перекладу, їхніх характеристик, ефективності та якості вихідого тексту на прикладах Google Translate, Microsoft Translator та Yandex. Для досягнення цієї мети було поставлено такі завдання: визначити найспроможніші алгоритми МП, що використовують сьогодні; порівняти результати перекладів, зроблених онлайн-перекладачами; проаналізувати типові стилістичні, лексичні та граматичні помилки, що постають у перекладі; виявити переваги та недоліки онлайн-перекладачів; надати рекомендації щодо вдосконалення машинного перекладу.
Методи дослідження. Для вирішення цих завдань було застосовано методи: описовий, порівняльний, метод аналізу, експерименту та лінгвістичної інтерпретації отриманих результатів.
Основні результати дослідження. Машинний переклад художніх текстів був надзвичайно добре опрацьований Яндексом і був цілком прийнятним (за винятком численних граматичних помилок) на платформі Google. Найгіршим виявився Microsoft Translator, неправильний переклад реалій і згадані помилки роблять його результат не таким зрозумілим, як після використання інших програм. Основні проблеми, які вбачаємо в таких перекладах, пов'язані з тим, що системи залежать від великої кількості високоякісних наборів даних (тобто корпусів текстів для певних мовних пар). Якість цих наборів безпосередньо впливає на якість виводу, у нашому випадку це якість тексту цільової мови. Це можна побачити, порівнявши середню якість перекладу між системами Google і Microsoft. Перший у середньому робить менше помилок і не має стільки проблем щодо визначення контекстного значення полісемантичної лексеми.
Висновки і перспективи. Уважаємо, що аналізовану проблему певною мірою можна вирішити одним з двох способів: використати знання професійних перекладачів і лінгвістів для складання паралельних корпусів або створити можливість для кожної людини зробити внесок у цей процес навіть у невеликому масштабі. Перший підхід забирає багато часу та праці, але в кінцевому підсумку надає якісніший набір даних, що може привести до подальшого покращення якості перекладу. Другий підхід уже впроваджено всіма трьома основними системами наукового машинного перекладу, але може призвести до сповільнення прогресу через відсутність контролю за якістю. Для нас, ще одна потенційна перспектива цього дослідження полягає в розширенні предметної галузі текстів, обраних для відбиття різноманітності стилів письма, що використовують в Інтернеті зараз. Залучення текстів конфесійного, ділового та інших стилів може вможливити виділення більше лакун у моделях нейронних мереж та запропонувати подальші шляхи вдосконалення.
Ключові слова: машинний переклад, мова перекладу, мова оригіналу, покращення, контекстуальне значення, спілкування.
Abstract
Background. Today machine translation is one of types of human activity. Machine translation can greatly facilitate global communication, accelerating the translation process, despite the imperfect quality of the source text. Most often the results of online tools require postediting and can only be effectively used by those who already speak the target language to some extent. The need for a competent translation is growing every year. Today, the search for an algorithm to deliver this quality of translation is one of the most important questions in computer science and linguistics, therefore informing the scientific relevance of this work.
The purpose of this paper is to analyze different approaches to the machine translation systems, their characteristics, efficacy and the quality of their output on the examples from Google Translate, Microsoft Translator and Yandex. To achieve this aim, the following tasks were set: to identify the most capable algorithms of MT in use today; to compare the results of translations made by online translators; to analyze typical stylistic, lexical and grammatical errors that appear in the translation; to identify the advantages and disadvantages of online translators; to provide recommendations for improving machine translation.
Methods. To solve these tasks, we use such methods over the course of this work: descriptive, comparative, analysis, experiment and the method of linguistic interpretation of the results obtained.
Results. Machine translation of belletristic texts was handled exceptionally well by Yandex and was quite acceptable (barring numerous grammatical errors) on Google 's platform. The outlier in this case is Microsoft Translator, whose mistranslation of realia and same aforementioned mistakes make its output much less readable that its competitors. The main problems we see arising from such translations arise from the fact that the systems depend on a large amount of high-quality data sets (i.e., corpora of texts for specific language pairs). The quality of these sets directly influences the quality of the output, which in our case is the quality of the target language text. It can be seen by comparing the average quality of translation between Google 's and Microsoft 's systems. The former one makes less mistakes on average and does not have as many issues in regards to identifying a contextual meaning of a polysemantic lexeme.
We believe that this issue can be fixed to a certain extent one of two ways: hiring professional translators and linguists to compile those parallel corpora or create a possibility for every person to contribute to this process even on a small scale. The first approach would be very time and labor consuming, but would ultimately provide us with a higher quality data set, which may lead to further improvements in MT. The second is already being deployed by all three major NMT systems but may lead slower progression due to lack of quality control and oversight. For us, another potential prospect of this research lies in widening the subject area of texts chosen to reflect the variety of writing styles in use on the Internet right now. Inclusion of texts from confessional, business, and other styles may allow us to highlight more lacunae in the neural network models and to suggest further means of improvement.
Keywords: machine translation, target language, source language, improvement, contextual meaning, communication.
Introduction
As a separate scientific discipline, translation studies emerged in the second half of the 20th century predominantly due to globalization efforts and intensification of inter-ethnic relations. By the end of the millennium, a new discipline was developing at an ever-increasing rate. Machine translation quickly became not just a theoretical discipline but a cornerstone of scientific cooperation, sitting on the crossroads of computer science, engineering and linguistics (Darwish, 2001; Forcada, 2017: 291; Gordin, 2015: 147; Hearne, 2011: 207; Jurafski, 2009: 276). Translations of texts from one language to another can be categorized as routine work, but only partially. On the one hand, in the work of any translator there are quite a few elements of formalism, but on the other hand, no translation can be done fully formally without any creative decision-making.
Today machine translation is one of types of human activity. Machine translation can greatly facilitate global communication, accelerating the translation process, despite the imperfect quality of the source text.
Most often the results of online tools require post-editing and can only be effectively used by those who already speak the target language to some extent. The need for a competent translation is growing every year. Today, the search for an algorithm to deliver this quality of translation is one of the most important questions in computer science and linguistics, therefore informing the scientific relevance of this work.
The aim of this paper is to analyze different approaches to the machine translation systems, their characteristics, efficacy and the quality of their output on the examples from Google Translate, Microsoft Translator and Yandex.
To achieve this aim, the following tasks were set:
1) to identify the most capable algorithms of MT in use today;
2) to compare the results of translations made by online translators;
3) to analyze typical stylistic, lexical and grammatical errors that appear in the translation;
4) to identify the advantages and disadvantages of online translators;
5) to provide recommendations for improving machine translation.
To solve these tasks, we use such methods over the course of this work: descriptive, comparative, analysis, experiment and the method of linguistic interpretation of the results obtained.
Methods and methodology of investigation. At the present stage of research, there are two main incentives for the development of machine translation. The first is purely scientific; it is determined by the complexity and intricacy of machine translation models. As a type of linguistic activity, translation affects all levels of language - from grapheme recognition to conveying the content of individual sentences and text as a whole. There is a need to accelerate the process of and increase the volume of translation, thus increasing the requirements for translation as an industrially applicable product. The second incentive is social. It is driven predominantly by the growing role of translation in the modern world as a prerequisite for the provision of interlingual communication, the volume of which is increasing every year.
The first generation of machine translation systems was based on sequential translation algorithms, that could on translate word-by-word, phrase-by-phrase. The capabilities of such systems were determined by the available vocabulary sizes, which directly depended on the amount of addressable computer memory. Translation of the text was carried out in separate sentences with meaningful connections between them not being taken into account. Such systems are called direct translation systems. Later on, they were replaced by subsequent systems, in which the translation from language to language was performed at the level of syntactic structures. The translation algorithms used a set of logical operations, with the following steps (Hutchins, 2005):
1) analyzing the translation sentence;
2) constructing its syntactic structure according to the rules of grammar of the source language;
3) transforming it into a syntactic structure of the original sentence according to the target language grammar;
4) synthesizing the original sentence, substituting the right words from the dictionary. Such systems are called T-systems (from the word «transfer»).
Building machine translation systems based on obtaining some meaningful representation of the input sentence through its semantic analysis is considered to be the aim of machine translation. It should then be followed by a synthesis of the sentence in the target language according to the obtained meaningful representation. Such systems are called I-systems (from the word «interlingua»). It is generally believed that the next generations of machine translation systems will belong to the class of I-systems (Hutchins, 1986: 135).
The first experiments with creating machine translation programs showed that it was necessary to solve problems one by one. There were too many difficulties and inaccuracies in the rules of formalization and creation of algorithms for text analysis, no agreement on the contents of dictionaries or on linguistic patterns to be used in machine translation. It turned out that traditional linguistics had neither the factual materials nor the ideas and concepts needed to build machine translation systems that would use the inner content to reconstruct the text.
At the end of the twentieth century, the statistical approach to machine translation started to develop. Ostensibly, such translation is not based on rules, but on statistics. The main method of such a translation is to train the machine by providing a sufficiently large (sample size of hundreds of thousands) number of parallel texts - containing the same information in different languages. Let us consider the methods of statistical translation by looking at the example model used by Yandex in the Yandex.Translate system as analyzed by M. Vozniuk. It consists of three stages: a translation model, a language model, and a decoder (Vozniuk, 2011: 143). The translation model for a pair of languages is a table consisting of all the words and phrases of the source language known to the machine and their translations into the target language, indicating the likelihood of such a conversion.
Statistics-based machine translation (SBMT or SMT) takes into account not only individual words, but also phraseology consisting of several words. Next is the language model, namely the language model for target language. It is a list consisting of all words and phrases found in the provided texts along with the frequency of their use. The system then proceeds directly to the translation process that the decoder deals with. Each sentence of the source text sorts through all viable translation options, combining phrases from the translation model, and arranging them in descending order of probability. Thus, the language model tells the decoder which version of the translation is more suitable for the given phrase, based on statistical data.
The main advantage of statistical systems is that their quality does not lag behind the development and mobility of the language: if any changes occur in the language, the system immediately recognizes this and learns independently (Chan, 2015: 385). Statistical systems also have high smoothness, that is, the output text is similar to the speech delivered by a person. However, the existence of such a system requires serious technical resources and high-quality parallel texts. Another significant drawback of such a system is the lack of sensitivity to the fine structure of the text, and, as a consequence, a large number of grammatical errors may be contained in the output text.
Nowadays machine translation is a rapidly developing area, especially considering the innovations brought by the neural networks. Since 2014, according to the BLEU metrics, the average evaluation of a translated text and how it compares to an authored translation significantly increased to 0.4, which indicates that MT systems are slowly closing the gap between automated and manual translations.
It is best to keep in mind, however, that while quality is the key metric for evaluation of various target language texts, we should keep in mind that these two types of translation may serve different purposes.
Results and discussion
In this article we will be analyzing stylistic, lexical and grammatical errors made by modern NMT systems when tasked with translating a text of a belletristic style. For our research we decided on using Kurt Vonnegut's 1973 novel «Breakfast of Champions or Goodbye Blue Monday». This novel was chosen due to Vonnegut's style being much more laconic and literal than that of a typical novelist. This feature of author's idiomatic style, complete with generous usage of simple stylistic devices such as inversion, deadpan and others, allows us to test NMT systems on a genuine literary work, all while not overtaxing said systems with too much contextual knowledge. In total, 50 samples from the novel were chosen for the analysis. machine translation online grammatical
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
This is a tale of a meeting of two lonesome, skinny, fairly old white men on a planet which was dying fast. |
Це казка про зустріч двох самотніх, худих, досить старих білих чоловіків на планеті, яка швидко вмирала. |
Це розповідь про зустріч двох самотніх, худих, досить старих білих людей на планеті, яка швидко вмирала. |
Це розповідь про зустріч двох самотньо, худі, досить старих білих людей на планеті, яка вмирав швидко. |
Even in the first example we can already see what will quickly become a trend going forward: Microsoft's translation utility is not properly equipped to deal with inflections and grammatical categories which differ from English (such as cases, grammatical gender etc.)
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Most other countries didn't have doodley- squat. |
У більшості інших країн не було дудлі- присідання. |
У більшості інших країн не було дудлі- сквот |
У більшості інших країн не було додулі- присадкуватий. |
This example clearly shows that in the situation with a colloquialism, all three of the NMT systems default to literal translation (squat - присідання, присадкуватий). To have squat is generally understood to mean «to have nothing at all», but as the neural; network was not made aware of general context of the phrase, it fell back on the old word-for-word translation techniques.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Dwayne Hoover's and Kilgore Trout's country, where there was still plenty of everything, was opposed to Communism. |
Країна Дуейна Гувера та Кілгора Траут, де ще було багато всього, була проти комунізму. |
Країна Дуейна Гувера і Кілгора Траута, де все ще було в достатку, була налаштована проти комунізму. |
В країні, де було ще багато всього, виступав проти комунізму. |
This is a first example where we can see a clear divergence between our results. Whereas Google's system leaned toward a more literal translation («багато всього»), Yandex decided to use paraphrase for its translation, which gave it a more natural feel («все ще було в достатку»). The most peculiar, however, is Microsoft's translation: it simply omitted the names of the protagonists from the beginning of the sentence, as if unsure how to translate them. Name translation will be a more interesting point in our next example:
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Trout and Hoover were citizens of the United States of America, a country which was called America for short. |
Форель та Гувер були громадянами Сполучених Штатів Америки, країни, яку коротко називали Америкою. |
Траут і Гувер були громадянами Сполучених Штатів Америки, країни, яка для стислості Називалася Америкою. |
Форель і Гувер були громадянами Сполучених Штатів Америки, країна, яка була названа Америка на короткий. |
Here we can more clearly see the issue with translation of names. Both Google and Microsoft translated «Trout» as «форель», which, while lexically correct is completely wrong from a contextual standpoint, as it is the last name of the novel's protagonist.
Moreover, Microsoft was unable to identify a correct vector of context for a phrase «for short», which led to a rather incongruent translation by the system.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Here was another piece of evil nonsense which children were taught: that the sea pirates eventually created a government which became a beacon of freedom to human beings everywhere else. |
Ось ще один шматок злих дурниць, яких навчали дітей: про те, що морські пірати врешті- решт створили уряд, який став маяком свободи для людей. |
Ось ще одна зловісна нісенітниця, яку навчали дітей: що морські пірати врешті-решт створили уряд, який став маяком свободи для людей у всьому світі. |
Ось ще один шматок зла дурниці, які навчали дітей: що морські Пірати в кінцевому підсумку створили уряд, який став маяком свободи людських істот всюди. |
This example mainly falls victim to literal translation, where «piece of evil nonsense» was translated word-for-word by both Google's and Microsoft's systems as «шматок зла» and «шматок злих дурниць». Yandex successfully evaded this particular issue, but all three chose to directly translation word-combination «beacon of freedom», whereas, in our opinion, a generalization would have worked much better (i.e. «символ свободи»).
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Actually, millions of human beings were already living full and imaginative lives on the continent in 1492. |
Насправді мільйони людей вже повною мірою живуть на континенті у 1492 році. |
Насправді в 1492 році мільйони людей вже жили повним і творчим життям на континенті. |
Насправді, мільйони людських істот вже живуть повним і творчим життям на континенті в 1492. |
This is a rather peculiar sample, as the underlying statistical data may explain the issue with this paragraph. Two out of three algorithms translated «imaginative» as «творчий». As this word if oftentimes used as a synonym for creative, they just substituted the closest word they have for that vector. We would suggest «насичений» as a contextual synonym here, as it has slightly wider application.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Here was the core of the bad ideas which Trout gave to Dwayne: Everybody on Earth was a robot, with one exception - Dwayne Hoover. |
Тут було серцевина поганих ідей, які Траут дав Дуейн: Усі на Землі були роботом, за одним винятком - Дуейн Гувер. |
Ось у чому була суть поганих ідей, які Траут вселяв Двейну: всі на землі були роботами, за одним винятком - Двейн Гувер. |
Тут був основний з поганих ідей, які форель дала Двейн: все на землі був робот, з одним винятком - Двейн Гувер. |
Here Google clearly misjudged the meaning of the word «core», which can mean «серцевина» in the context of talking about wood and trees, where wooden core is the innermost part of the tree. Even Microsoft Translator managed the give roughly the correct idea, although the pervasive problems with inflections and syntax errors make this text mostly unreadable. All systems are constrained by inability to completely change the structure of the sentence without altering its meaning or intent - which can be perfectly seen as all three of them use dash to show who the narrator is talking about. A full transformation could be beneficial here to make the sentence more natural-sounding: «Ось у чому була суть поганих ідей Траута, якими він годував Двейна: усі на Землі були роботами, за винятком самого Двейна Гувера...».
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
His high school was named after a slave owner who was also one of the world's greatest theoreticians on the subject of human liberty. |
Його середня школа була названа в честь рабовласника, який також був одним з найбільших теоретиків світу на тему свободи людини. |
Його Вища школа була названа на честь рабовласника, який також був одним з найбільших теоретиків у світі з питання людської свободи. |
Його середню школу було названо на честь рабської власниці, яка також була одним з найбільших теоретиків світу з тематики людської свободи. |
This example mostly displays that even with major issues in regards to syntax, Microsoft Translator can still produce reasonably accurate results. Previously «high school» was understood and translated as «середня школа», which is represented by Google's and Microsoft's translation. Nowadays, though, it is mostly translated verbatim as «старша школа». Also of note is the final part of the sentence, with both Yandex («одним з найбільших теоретиків у світі з питання людської свободи») and Microsoft's «одним з найбільших теоретиків світу з тематики людської свободи» showing great results for translating structures with prepositions.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Dwayne's bad chemicals made him take a loaded thirty- eight caliber revolver from under his pillow and stick it in his mouth. This was a tool whose only purpose was to make holes in human beings. |
Погані хімічні речовини Дуейна змусили його взяти з-під подушки завантажений револьвер тридцяти восьми калібрів і засунути його в рот. Це був інструмент, єдиною метою якого було зробити діри в людях. |
Погана реакція Дуейна змусила його витягнути з-під подушки заряджений револьвер тридцять восьмого калібру і сунути його в рот. Це був інструмент, єдиною метою якого було робити дірки в людських істотах. |
Погані хімічні Дуейн зробив йому взяти завантажений револьвер 38 калібру під подушку і дотримуватися його в рот. Це був інструмент, чия єдина мета полягала в тому, щоб зробити отвори в людських істот. |
This paragraph shows us the downside of NMT systems. Yandex's translation of «bad chemicals» is most likely due to rather frequent occurrence of a phrase «bad chemical reaction» in the dataset used to train the system, which led to its decision to translate it directly, omitting the word chemical as it deemed it unnecessary. Microsoft Translator translated verb «to stick» as «дотримуватися», which makes sense as it is oriented on business users and international communication. This meaning of the word is more common than the one the author intended. This shows us that Microsoft still leans heavily on their statistical model, while they are still in the process of implementing full capabilities of neural networks.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
I do not know who invented the body bag. I do know who invented Kilgore Trout. I did. |
Я не знаю, хто винайшов сумку для тіла. Я знаю, хто винайшов Килгор Траут. Я зробив. |
Я не знаю, хто винайшов мішок для трупів. Я знаю, хто винайшов Кілгора Траута. Я зробив. |
Я не знаю, хто винайшов тіло мішок. Я знаю, хто винайшов Кілгор форелі. Я робив. |
Right from the outset we can see that only one (Yandex) of three NMT systems has successfully translated «the body bag» idiom. The other two systems failed to recognize it as well as to set correct inflections for the words according to their cases. All three systems failed the translation of the last sentence in this example. «I did» here refers to the previous sentence, and as such, a more appropriate and simpler solution should have been «Я» or «Це зробив я». What makes this failure more surprising is the fact that recurrent neural networks were able to retain contextual information outside the bounds of a single sentence since early 2017. We believe this mistranslation may be due to an incorrectly trained target language neural network.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
The clothes were conservative and neat, in Harry's opinion. |
Одяг був консервативним і акуратним, на думку Гаррі |
Одяг був консервативним та акуратним, на думку Гаррі. |
Одяг був консервативним і акуратним, на думку Гаррі |
This is the only example in our testing whereby all three systems had the same output. From this one may induce that NMT can now handle translation of simple sentences quite well.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
So poor Harry spent a wretched Veterans' Day weekend after that. But Dwayne spent a worse one. |
Так бідний Гаррі після цього провів жалюгідний день ветеранів. Але Дуейн провів гірше. |
Так що після цього бідолаха Гаррі провів жалюгідні вихідні в День ветеранів. Але Дуейн провів ще гіршу ніч. |
Так бідні Гаррі провів жалюгідний ветеранів день у вихідні після цього. Але Дуейн провів гірше. |
In this example all of our systems were unable to transform the original sentence in such a way that would conform to grammatical rules of the target language. A more correct translation would be «Після того бідолашний Гаррі жахливо провів День ветеранів. Але Дуейн провів його ще гірше». The problem is once again in the lack of contextual transference between the first and the second sentence. Only Yandex tried to insert additional context into the second sentence, but it chose the wrong one, as there was no mention of night in the original text.
Here we once again observe that translation systems designed by primarily English-speaking multinational companies, such as Microsoft and Google, took a much more English-centric approach to the training of their NMT systems. Both of those systems exhibit typical issues, such as using a wrong grammatical gender, wrong tense (would in the case of a reminiscence should be translated by past tense in Ukrainian), small stylistic issues («Дай мені всі ваші гроші» where the first word implies familiarity towards a single recipient, but the second one implies either more recipients of the message or a degree of respectful address, which is not typical for a robbery) etc.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
While Kilgore Trout was inadvertently poisoning the collective mind of New York City, Dwayne Hoover, the demented Pontiac dealer, was coming down from the roof of his own Holiday Inn in the Middle West. |
У той час як Килгор Траут був ненавмисно отруює колективний розум Нью-Йорка, Дуейн Гувер, в божевільному дилера Pontiac, спускався з даху свого власного Holiday Inn на Середньому Заході. |
У той час як Кілгор Траут ненавмисно отруював колективний розум Нью-Йорка, Двейн Гувер, божевільний дилер «Понтіака», спускався з даху свого власного готелю «Холідей Інн» на Середньому Заході. |
Хоча Кілгор форель ненавмисно отруєння колективним розумом Нью-Йорка, Дуейн Гувер, божевільний дилер Pontiac, спускається з даху свого готелю Holiday Inn на Близькому заході. |
In this particular case, Google's algorithm could not properly handle the tense of the sentence («був отруює») and case inflections («божевільному дилера») whereas Yandex's hybrid approach and its predilection towards Slavic languages yielded almost perfect translation. It automatically picked up on the context that Holiday Inn was, in fact, a hotel, and specified this information for Ukrainian reader. One peculiar fact is that Yandex even transliterated word «Pontiac» and assigned it the inflection of the genitive case, which may suggest a more complete understanding of the concept and the context behind it, while two other systems opted to keep the brand in English. Microsoft's algorithm once again mistranslated the protagonist's last name as the service was never really designed to translate literary works.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
Patty Keene flunked English during the semester when she had to read and appreciate Ivanhoe, which was about men in iron suits and the women who loved them. |
Петті Кіні завалив англійську мову протягом семестру, коли вона повинна була прочитати і оцінити Айвенго, який був про чоловіків в залізних костюмах і жінок, які люблять їх. |
Петті Кін завалила англійську протягом семестру, коли їй довелося читати і цінувати «Айвенго», який був про чоловіків в залізних костюмах і жінок, які їх любили. |
Патті кейен завалив англійською мовою в семестр, коли вона повинна була читати і цінувати іваники, яка була про чоловіків у залізних костюмах і жінки, які любили їх. |
From the outset, we can see a difference in translation of a person's name. Google Translate opted to use the approach of transliteration. Yandex, on the other hand, decided on transcription. While both approaches may be valid, we believe the latter is the more accurate of the two. Microsoft Translator could not handle this task and settled somewhere in between the two other answers. What is more interesting is that the first two NMT systems are much more aware of cultural realia and famous works of literature as they are able to translate name «Ivanhoe»» without issues. Microsoft's business-oriented MT system apparently lacks this awareness which is actively affecting its ability.
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
And Chaos announced that it was about to give birth to a new me by putting these words in the mouth of Rabo Karabekian: «What kind of a man would turn his daughter into an outboard motor»? |
І Хаос оголосив, що збирається народити нового мене, вклавши в уста Рабо Карабекяна такі слова: «Який чоловік перетворить свою дочку на Позашляховий мотор»? |
І хаос оголосив, що ось-ось народить Нового мене, вклавши в уста Рабо Карабекяна такі слова: «яка людина перетворить свою дочку в підвісний мотор»? |
І хаос оголосив, що збирався народити новий мене, поставивши ці слова в уста Рабо Карабекяна: «яка людина Оберне дочку в Човновий Мотор»? |
This example was particularly interesting to analyze. All three systems managed to successfully translate a character's name. However, only Google's algorithm preserved the capitalization on the word «Chaos», which was correct as the author intentionally imbued it with significance, as if making it a deity from the Greek myth. Even more interesting is that phrase «outboard motor» was translated three different ways. Yandex was able to correctly specify the type of the motor, whereas Microsoft's system went for generalization, hence the «човновий мотор». Google was the only one incapable of translating this phrase.
Let us take a look at one final example:
Original text |
Google Translate |
Yandex.Translate |
Microsoft Translator |
|
«It's all like an ocean!» cried Dostoevski. I say it's all like cellophane. |
«Це все як океан!» -- закричав Достоєвський. Я кажу, це все як целофан. |
«Все це схоже на океан!» -- вигукнув Достоєвський. Я кажу, що все це схоже на целофан. |
«Це все як океан!» заволав Достоєвські. Я кажу, що все це як целофан. |
Only Microsoft Translator had minor issues with translation of Dostoyevsky's name. Barring that, all three variants of translation are acceptable.
Conclusions
All in all, machine translation of belletristic texts was handled exceptionally well by Yandex and was quite acceptable (barring numerous grammatical errors) on Google's platform. The outlier in this case is Microsoft Translator, whose mistranslation of realia and same aforementioned mistakes make its output much less readable that its competitors.
Although the quality of machine translation is largely improved, it is still not usable for anything outside of facilitation of global communications. Considering that this area of research is closely related to computer speech recognition, an argument could be made that MT will find future uses as a tool for an on-the-fly interpreting for those traveling abroad for business without proper knowledge of the language. As the services of a highly-trained and qualified interpreter are costly, it would make sense for large corporations to invest upfront into machine translation to save costs later, which is exactly Google's current trend with their system already having a beta version of said feature.
According to our research, the mistakes made by modern NMT systems mostly fall into the following categories: incorrect/inappropriate punctuation for the target language, where the translation may be coherent and adequate, but incorrectly used commas make it harder to parse; inflections, which usually come in the form of adjectival phrases (погана пес, зелений ліси etc.). The same is also true for verb conjugations.
The main problems we see arising from such translations arise from the fact that the systems depend on a large amount of high-quality data sets (i.e., corpora of texts for specific language pairs). The quality of these sets directly influences the quality of the output, which in our case is the quality of the target language text. It can be seen by comparing the average quality of translation between Google's and Microsoft's systems. The former one makes less mistakes on average and does not have as many issues in regards to identifying a contextual meaning of a polysemantic lexeme.
We believe that this issue can be fixed to a certain extent one of two ways: hiring professional translators and linguists to compile those parallel corpora or create a possibility for every person to contribute to this process even on a small scale. The first approach would be very time and labor consuming, but would ultimately provide us with a higher quality data set, which may lead to further improvements in MT. The second is already being deployed by all three major NMT systems but may lead slower progression due to lack of quality control and oversight.
A further analysis of this technology in the following years may yield useful information regarding their evolution and changes in behavior as the systems mature.
Moreover, a comparative study of the same inputs on the same systems in 3 to 5 years will depict qualitative differences and can give us a better insight as to the progression rate in the field.
For us, another potential prospect of this research lies in widening the subject area of texts chosen to reflect the variety of writing styles in use on the Internet right now. Inclusion of texts from confessional, business, and other styles may allow us to highlight more lacunae in the neural network models and to suggest further means of improvement.
References
1. Chan, S. (2015). Routledge Encyclopedia of Translation Technology. Oxon: Routledge [in English].
2. Darwish, A. (2001). Transmetrics: A Formative Approach to Translator Competence Assessment and Translation Quality Evaluation for the New Millennium. Retrieved from: http://www.translocutions.com/translation/transmetrics_2001_revision.pdf [in English].
3. Forcada, M. L. (2017). Making sense of neural machine translation. Translation Spaces, 6, 291-309 [in English].
4. Gordin, M. D. (2015). Scientific Babel: How Science Was Done Before and After Global English. Chicago; Illinois: University of Chicago Press [in English].
5. Hearne, M. (2011). «Statistical Machine Translation: A Guide for Linguists and Translators». Language and Linguistics Compass, 5, 1-21 [in English].
6. Hutchins, W. J. (1986). Machine translation: past, present, future. New York City [in English].
7. Hutchins, W. J. (2005). Machine translation: a concise history. New York. Retrieved from: https://pdfs.semanticscholar.org/e97a/40cc28ce17a17ce9b73d77e69ffa1210fa25.pdf [in English].
8. Jurafsky, D. (2009). Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall [in English].
9. Vozniuk, M. Yu. (2011). Cryterii otsiniuvannia perekladu [Criteria of translation assessment]. Visnyk LNU imeni Tarasa Shevchenko - Herald of Taras Shevchenko LNU, 9 (220), P. II, 143-149 [in Ukranian].
Размещено на Allbest.ru
Подобные документы
Analysis the machine translation failures, the completeness, accuracy and adequacy translation. Studying the equivalence levels theory, lexical and grammatical transformations. Characteristic of modern, tradition types of poetry and literary translation.
методичка [463,5 K], добавлен 18.01.2012Primary aim of translation. Difficulties in of political literature. Grammatical, lexical and stylistic difficulties of translation. The difficulty of translation of set phrases and idioms. The practice in the translation agency "Translators group".
курсовая работа [77,5 K], добавлен 04.07.2015Concept, essence, aspects, methods and forms of oral translation. Current machine translation software, his significance, types and examples. The nature of translation and human language. The visibility of audiovisual translation - subtitling and dubbing.
реферат [68,3 K], добавлен 15.11.2009The history of translation studies in ancient times, and it's development in the Middle Ages. Principles of translation into Greek, the texts of world's religions. Professional associations of translators. The technology and terminology translation.
дипломная работа [640,7 K], добавлен 13.06.2013Studying the translation methods of political literature and political terms, their types and ways of their translation. The translation approach to political literature, investigating grammatical, lexical, stylistic and phraseological difficulties.
дипломная работа [68,5 K], добавлен 21.07.2009History of interpreting and establishing of the theory. Translation and interpreting. Sign-language communication between speakers. Modern Western Schools of translation theory. Models and types of interpreting. Simultaneous and machine translation.
курсовая работа [45,2 K], добавлен 26.01.2011A brief and general review of translation theory. Ambiguity of the process of translation. Alliteration in poetry and in rhetoric. Definitions and main specifications of stylistic devices. The problems of literary translation from English into Kazakh.
курсовая работа [34,6 K], добавлен 25.02.2014Analyze the term "proper name". The problem of defining a proper name of television and his role in our life. The approaches to the translation of this phenomenon. Classification of proper names. English titles of films and their translation into Russian.
курсовая работа [31,9 K], добавлен 27.06.2011The structure and purpose of the council of Europe. The structural and semantic features of the texts of the Council of Europe official documents. Lexical and grammatical aspects of the translation of a document from English to ukrainian language.
курсовая работа [39,4 K], добавлен 01.05.2012Investigation of the process of translation and its approaches. Lexical Transformations, the causes and characteristics of transformation; semantic changes. The use of generic terms in the English language for description specific objects or actions.
курсовая работа [38,0 K], добавлен 12.06.2015