IIIT-Hyderabad model enables watching Telugu videos in Hindi without subtitles
To obtain a fully translated video with accurate lip synchronization, the researchers introduced a visual module called LipGAN.

HYDERABAD: Watching a Telugu video automatically translated into Hindi without subtitles may soon be a reality as a team of researchers from the International Institute of Information Technology, Hyderabad (IIIT-H) has developed a machine learning model that can automatically translate a video of any person speaking in one language to another.
Currently, translation systems for videos generate a translated speech output or textual subtitles. An out-of-sync dubbed movie or other video content ruins the viewer's experience. In order to automate translation of videos, the team of researchers led by Prof CV Jawahar, dean (research and development) at the IIIT-H developed a ML model that can perform face-to-face translation.
Using ML, the model can take a video of a person speaking in one language and deliver a video output of the same speaker in another language. For example, video of a person speaking in Telugu can automatically be translated in any other desired language such as Marathi, Hindi, Bengali etc. such that the voice style and lip movements match the desired language.
"Earlier when we spoke about automatic translation, it used to be text-to-text, then came speech-to-speech translation. They do not handle the visual component. As a result, when such translated speech is overlaid on the video, the lip movements are typically out of sync with the audio," said Prajwal KR and Rudrabha Mukhopadhyay, students who worked on the model.
To obtain a fully translated video with accurate lip synchronization, the researchers introduced a visual module called LipGAN. The module can also correct lip movements in an original video to match the translated speech. For example, badly dubbed movies with out-of-sync lip movements can be corrected with LipGAN, that has been trained on large video datasets, making it possible to work for any voice, any language and any identity.
The research paper titled 'Towards Automatic Face-to-Face Translation' which was presented at the ACM International Conference on Multimedia at Nice, France in October 2019. A comparative analysis of their tool vis-a-vis Google Translate for English-Hindi machine translation found the in-house tool to be more accurate.
Currently, translation systems for videos generate a translated speech output or textual subtitles. An out-of-sync dubbed movie or other video content ruins the viewer's experience. In order to automate translation of videos, the team of researchers led by Prof CV Jawahar, dean (research and development) at the IIIT-H developed a ML model that can perform face-to-face translation.
Using ML, the model can take a video of a person speaking in one language and deliver a video output of the same speaker in another language. For example, video of a person speaking in Telugu can automatically be translated in any other desired language such as Marathi, Hindi, Bengali etc. such that the voice style and lip movements match the desired language.
"Earlier when we spoke about automatic translation, it used to be text-to-text, then came speech-to-speech translation. They do not handle the visual component. As a result, when such translated speech is overlaid on the video, the lip movements are typically out of sync with the audio," said Prajwal KR and Rudrabha Mukhopadhyay, students who worked on the model.
To obtain a fully translated video with accurate lip synchronization, the researchers introduced a visual module called LipGAN. The module can also correct lip movements in an original video to match the translated speech. For example, badly dubbed movies with out-of-sync lip movements can be corrected with LipGAN, that has been trained on large video datasets, making it possible to work for any voice, any language and any identity.
The research paper titled 'Towards Automatic Face-to-Face Translation' which was presented at the ACM International Conference on Multimedia at Nice, France in October 2019. A comparative analysis of their tool vis-a-vis Google Translate for English-Hindi machine translation found the in-house tool to be more accurate.
All Comments ()+^ Back to Top
Refrain from posting comments that are obscene, defamatory or inflammatory, and do not indulge in personal attacks, name calling or inciting hatred against any community. Help us delete comments that do not follow these guidelines by marking them offensive. Let's work together to keep the conversation civil.
HIDE