About 7,000 languages are spoken in the whole world, and it is often that we need to understand another language, not only to talk but also to understand the point spoken by another, such as press Conference, science project, etc. And to make different languages communication possible translators have been invented. And for the past few years, when the internet has connected the whole world, the use of these translators has increased greatly. Among these translators, Google translate is the most advanced and most used translator, which has always developed new ways of translation. And this time Google is doing a similar experiment in the field of translation which is directly related to Google’s AI.
Usually the work of all translators is divided into main 3 parts, first one is automatic speech recognition to transcribe the source speech as text, the second is machine translation for translating the transcribed text into the target language, and the third one is Text-to-speech synthesis known as TTS to generate speech in the target language from the translated text. Although there is no problem with this usual process but Google felt the need for a new upgrade, which is a new system, based on a single attentive sequence-to-sequence model. About which Google says, “In “Direct speech-to-speech translation with a sequence-to-sequence model”, we propose an experimental new system that is based on a single attentive sequence-to-sequence model for direct speech-to-speech translation without relying on intermediate text representation”.
This new Google system can translate the word spoken by any person into another language, without changing the original person’s voice and tone. Now, most of the people wondering how that can be possible, so Google’s translator was able to do this because it translates audio input directly to the audio output without any intermediate text representation. Usually, the translators translate audio into text, then translate the text, and then resynthesize the audio, and present it in a new voice which I already mentioned above but The new system which is known as Dubbed Translatotron, avoids the division of tasks into different stages. And as compared to the old cascaded system it has faster inference speed, naturally avoiding compounding errors between recognition and translation, making it straightforward to contain the original voice of the after translation, and better handling of words that do not need to be translated (e.g., names and proper nouns)
The new system has two components, the first component is Neural Vocoder which uses a neural network to map the audio spectrogram in the original language (Input) to the audio spectrogram in the output language and converts the spectrogram into a playable audio wave. And the second component maintains the voice and tone of the original speaker in the output obtained in the last.
Although the output voice is still not very clear as you can find by listening to the translated voice given above, but still this is a very good result which can further improve in the future. So far all the tests on Translatotron have been done only with Spanish-to-English translation. You can hear all the translated voice clips by clicking here.
so that is everything about google AI’s translatotron system which provides us An end-to-end speech translation. So just wait a few days and then with google translate, you will be able to translate any of your speech into any other language without changing your original voice and ton. Hopefully, today’s our article will prove useful to you, as well as if you have any questions in relation to google translate’s new technology translatotron, then you can share it with us in the comments section.