Skip to main content
Advertisement
Advertisement

live Asia

India’s many languages pose a challenge to the development of its large language model- Testing

New: You can now listen to articles.

This audio is generated by an AI tool.

NEW DELHI: India is building its own large language model it hopes one day may rival OpenAI's chatbot ChatGPT, but the country’s countless languages and dialects have made training it a challenge.

India has 22 officially recognised languages and more than 10,000 local languages.

Some languages like Marathi share common roots with others such as Hindi and Gujarati, while others spoken in South India - such as Kannada, Telugu, Tamil and Malayalam - are completely different.

A large language model has to process these multiple languages seamlessly, and coding an AI model capable of understanding most of them, if not all, remains complicated.

TRAINING AI ON LOCAL LANGUAGES

One challenge faced by BharatGen, a consortium funded by India’s government, in training their large language model is a lack of online content in Indian languages.

The consortium said that while roughly half of all the data available on the internet is in English, Indian languages make up barely 1 per cent.

Literary works in many Indian languages have never been digitised, while a raft of cultural and traditional information has been verbally passed down for generations without being stored online.

On a more positive note, experts said that the diversity of languages and data collected from local sources could help create AI models with fewer biases.

Ganesh Ramakrishnan, a professor at the Indian Institute of Technology Bombay, told CNA his work involved reaching out to magazines, data sources, foundations and non-governmental organisations who have been gathering data in their local languages.

“(We have been) making it possible to digitise and digitalise and reflect that in the foundational model … so this is a big opportunity,” said Ramakrishnan, who is part of the BharatGen consortium.

EXISTING CHATBOTS ARE INADEQUATE

Some small business owners, who have tried using AI as part of their operations, said they have faced language challenges when using existing chatbots.

Ghooran Yadav, a food cart owner in New Delhi, said that he used ChatGPT to enquire about the recipe of the food he sells, but received an underwhelming response.

The app understood his question in the local dialect of Bhojpuri but replied in Hindi.

The new Harry, Ron and Hermione have received their Hogwarts admission letters. The new cast: Arabella Stanton, Dominic McLaughlin, Alastair Stout. Dominic McLaughlin will portray The Boy Who Lived, Harry Potter himself. Alastair Stout will play Harry’s best friend Ronald Weasley, and Arabella Stanton will round out the classic trio of heroes as bookworm Hermione Granger, who matches the courage of the previous two Gryffindors.

Ghooran said foreign chatbots are not as accurate and that he prefers a locally-made app.

“If it’s made in India, it’s more likely to give me correct information. Nothing could be better than that,” he added.

EASE OF USE

BharatGen is also aiming to utilise generative AI to solve everyday problems and eventually help deliver services such as providing information about welfare programmes to the people.

An app called Krishi Saathi (“With Farmers” in Hindi), which is powered by BharatGen’s Hindi language model, is helping to answer farmers’ questions about crop health and pest management.

The app can translate text to local languages. It also allows those who are unable to read or write to communicate by speaking via the app.

“Making sure that the most remotely inaccessible regions also benefit from AI - that is part of the vision here,” said Ramakrishnan.

The AI model can copy a speaker's voice and tone, communicating with the user like an actual person once it has been trained to do so.

BharatGen, one of five major language-based AI projects currently supported by Indian Prime Minister Narendra Modi's government, has already rolled out 19 language models since its inception last year.

Source: CNA/kl(lt)
Advertisement

Also worth reading

Advertisement