How does Google Translate's AI work?.




Natural Language Processing (NLP) Tutorial with Python & NLTK

Load more...

.

Only translation can only search vector language assistant Does he understand us? How the computer learns human language People are increasingly trying to talk to computers in a language he can understand. We talked with experts in the field of computer linguistics, PhD in linguistics, and associate professor of the School of Languages ​​of the Higher School of Economics to discuss how computers learn people’s languages ​​and the most popular technologies in this direction. In the 1981s, scientists promised that within five years, the need for human translations would completely disappear, because computers would replace them. But until now, even the most popular online translators have not been able to compete with real experts in translation quality. But they surpassed them in speed and coverage-for example, according to statistics dedicated to the tenth anniversary, one billion people have used the service. In addition,machine translation is free and supports the largest language in the world. Interestingly, in the past ten years, translators and linguists have nothing to do with machine translation.
Since the 1950s, linguists have been creating dictionaries, expanding grammar and creating translation algorithms. However, important results were obtained only with the help of neural networks. They don't need dictionaries or grammar-just real-time data, large amounts of text. " The neural network processes the data set and captures the patterns in it. For example, she can analyze the letter sequence and based on its construction model, translate words that do not appear in the training data, that is, new words. Yes, sometimes this can lead to interesting situations.

In one year, when translating from English to French, the phrase () becomes (). The error was discovered and quickly eliminated. But you can now look at another error. Just select the translation from Mongolian to Russian in Translate and enter something like the following:. The results will surprise you. When translating the combination of Somali types into English, the service will quote the quotation from the Bible) and then forward it for clarification. The company representative answered:
Machine translation now has two key problems: the so-called lack of data and materials. Dirty data will appear when the neural network compares texts with the same content but no obvious semantic differences between the texts: for example, in the English product description text, all prices will be expressed in rubles, and all prices will be expressed in dollars Said. Therefore, the neural network can remember this parallelism and translate it into. The solution to this problem is to conduct additional training on neural networks and user prompts. As for the lack of materials, this problem is rare, and there are very few problems with the language of the written source. Benjamin bowness lexserv Translation requires two parallel text corpora, if not enough, the neural network will not be able to learn. In this case, a closely related language comparison will help. If you understand the composition of a language and correctly determine the necessary components of the language constructor, you can even learn translation from a fictitious language, and there are not many words described by the author in a fictitious language, but with this At the same time, we can guess the appearance of new words. Following the same logic, I taught my translation to use language.
Not only understand the direct query, but also analyze the context and give a meaningful article, but it does not always contain the words in the query. With the development of vector (or distributed) semantics, it is possible to realize this search. The text is constructed in the form of a vector (for example, you can set a vector for each word), and the computer converts it mathematically. according
By comparing the vectors and their meanings, the computer can determine semantically similar words, and they are not always synonyms. Therefore, semantically, the words will be close, and-may be cold or hot, so search engines will provide a query for pages containing these contexts to contain either of the two words. Not only can search, but also can cluster text. It can be said that this is how news aggregators combine text into common topics: headlines may contain different text, but the machine understands that their content is similar. The cosine distance between them is so small that they are actually the same! Another technique that makes search engines so comfortable is keyword highlighting. Thanks to him, the computer can understand how to rank the results, exactly what is displayed in the code segment, and determine what the text is talking about. Chatbots and voice assistants can also process text. It interprets and emits sounds through speech modules, but it is basically the text that uses neural networks to learn and process. In fact, the modules of these neural networks are similar to those used for machine translation: the difference is that the data corpus is not in a different language but in one language. In a sense, the neural network also performs translation, but the question serves as the source text. The result is the answer.
The main problem with today's voice assistants is that neural networks cannot remember context. For example, when you have a conversation with the voice assistant, he does not remember that he answered you three times. A person constantly optimizes the language, and with the help of pronouns, leads the interlocutor to what has already been said: instead of repeating his last name, first name and subject every time, he says: or. The living interlocutor will remember the context, but the neural network will not. You can't say to her:... She doesn't remember. Therefore, a true real-time voice assistant requires not only a neural network, but also some other modules. This problem has not been resolved. " One year, and written completely by hand with dialogue. Its creator gave her the ability to imitate the reception and active listening skills of a psychotherapist, highlighting the point of each problem. If she could not find a suitable answer in her database, she would say:. Chatbots with few features can still be used, such as virtual consultants on websites. first
By teaching computers to use natural language, computer linguists and programmers conducted experiments. For example, machines are provided to create artworks on their own. Load eight volumes of Rowling's "Porter's Story" into the computer memory. After processing the received text, the computer will write its own story and name it a story. People write the first word in the program, and then start a chain-based algorithm-similarly, smartphones provide us with continuous messages or automatic corrections. The algorithm retains Rowling's style and uses the words contained in the book. When she took the wrong stairs to visit herself. " An algorithm that automatically determines the verse size of a given phrase. The automatic rhyming of poems requested by users, and they will form poems of different sizes and shapes. Not
A large number of projects are based on the fact that neural networks process a defined array of data and try to create new things based on this data. For example, the project generates its own recipe. But we are not sure if anyone is willing to try. Use salt water to whiten the pilaf and serve. "

Using the vector model, his algorithm processed Russian classics and replaced each word with synonyms. By chance, in the spring, at noon on an unprecedentedly hot sunrise, two compatriots appeared in Kazan, a metropolitan stream.

the
One spring, at an unprecedentedly hot sunset, two citizens appeared in the Patriarch’s Pond in Moscow. If we compare the quality of machine translation, the work of search engines, or the capabilities of voice assistants ten years ago and now, even without analysis, we can see that the quality is greatly improved with the naked eye. Now, the period related to computational linguistics and computer language teaching is in a period of climax and optimism. There are some tools that can be further improved, and there are some areas of work and problems that need to be solved. By the way, to evaluate how the voice assistant speaks the language and understands you, you can pass or play with Alice.
data
If you are the host, please refer to our troubleshooting guide. When citing and using any material, a link is required. Natural Language Processing (NLP) Tutorial with Python & NLTK