export AI to poor countries, in the hope it might improve everything from schools to health care. Researchers around the world are therefore working to make AI more multilingual. India’s government is particularly keen.
Many of its public services are already digitised, and it is keen to fortify them with AI. In September, for instance, it launched a chatbot to help farmers get information about state benefits. The bot works by welding two sorts of language model together, says Shankar Maruwada of the EkStep Foundation, a non-profit that helped build it.
Users can submit queries in their native tongues. (Eight are supported so far; five more are coming soon.) These are passed to a piece of machine-translation software developed at IIT Madras, an Indian academic institution, which translates them into English. The English version of the question is then fed to the LLM, and its response translated back into the user’s mother tongue.
The system seems to work. But translating queries into an LLM’s preferred language is a rather clumsy workaround. After all, language is a vehicle for worldviews and culture as well as just meaning, notes the boss of one Indian AI firm.
A paper by Rebecca Johnson, a researcher at the University of Sydney, published in 2022, found that ChatGPT-3 gave replies on topics such as gun control and refugee policy that aligned most with the values displayed by Americans in the World Values Survey, a global questionnaire of public opinion. Many researchers are therefore trying to make LLMs themselves more fluent in less widely spoken languages. One approach is to modify the tokeniser, the part of an LLM that chops words into smaller chunks for the rest of the model to manipulate.
Read more on livemint.com