IIT Madras faculty develop AI to process text in 11 Indian languages

They released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi

Published: 22nd September 2020 07:12 PM  |   Last Updated: 22nd September 2020 07:12 PM   |  A+A-

IIT Madras

IIT Madras (File photo | EPS)

By Express News Service

CHENNAI: Indian Institute of Technology Madras (IIT-M) faculty have developed Artificial Intelligence (AI) models and datasets to process texts in 11 Indian languages. According to a statement issued by the institute, this was taken up jointly with ‘AI4Bharat,’ a platform for building AI solutions for local problems.

Elaborating on this initiative, Mitesh M Khapra, department of Computer Science and Engineering, said, “As we move towards a digital economy, it is important that our languages find a space online. This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages.”

For example, imagine a learner who posts a question on an e-learning platform in Tamil, Hindi or another Indian regional language. There is a need for tools that can automatically process
such questions written in Indian languages and classify them into specific topics.

They released AI models and datasets for the following languages: Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati, and Marathi. The statement added that an accompanying research paper describing the research methodologies and evaluation has been accepted at EMNLP-Findings -- a companion publication at one of the top Natural Language Processing conferences.

AI4Bharat is an initiative co-founded by Khapra and Pratyush Kumar from IIT-M and works to solve India specific problems in a community-driven, open-sourced manner. They are also associated with the Robert Bosch Centre for Data Science and Artificial Intelligence.

For the past one year, a team of researchers comprising students, faculty and volunteers from IIT Madras and AI4Bharat worked on collecting data and training powerful models for processing text
written in Indian languages, the statement said, adding that the AI took advantage of the similarities between Indian languages to make efficient use of data. These open-source models are freely available and can be downloaded from a Github repository (https://indicnlp.ai4bharat.org/).

More from Chennai.

Comments

Disclaimer : We respect your thoughts and views! But we need to be judicious while moderating your comments. All the comments will be moderated by the newindianexpress.com editorial. Abstain from posting comments that are obscene, defamatory or inflammatory, and do not indulge in personal attacks. Try to avoid outside hyperlinks inside the comment. Help us delete comments that do not follow these guidelines.

The views expressed in comments published on newindianexpress.com are those of the comment writers alone. They do not represent the views or opinions of newindianexpress.com or its staff, nor do they represent the views or opinions of The New Indian Express Group, or any entity of, or affiliated with, The New Indian Express Group. newindianexpress.com reserves the right to take any or all comments down at any time.