Google expands language campaign to serve non-English speakers in India
There are over 600 million internet users in India, but only a fraction of that population is fluent in English. However, most online services and much of the content on the web are currently available exclusively in English.
This language barrier continues to contribute to a digital divide in the world’s second largest Internet market, which has limited interpretation of the World Wide Web by hundreds of millions of users to a few select websites and services.
So it’s no surprise that the US tech giants, which rely on emerging markets like India for continued growth, are increasingly trying to make the web and their services accessible to more people.
Concrete example: a characteristic which Google Quickly translate web page content from English to Indian languages has been used over 17 billion times by users in India over the past year.
On that last part, the company today announced a series of changes it is rolling out to some of its services to make them speak more local languages, and unveiled a whole new approach it is taking to translating languages.
Users will now be able to see the results of their queries in Tamil, Telugu, Bengali and Marathi, in addition to English and Hindi, which are currently available. The addition comes four years after Google added the Hindi tab to the search page in India. The company said that the volume of search queries in Hindi increased by more than 10 times after the introduction of this tab. If someone prefers to see their query in Tamil, for example, they will now be able to set a Tamil tab next to English and quickly switch between the two.
Getting search results in a local language is helpful, but often people want to search in those languages as well. Google says it has discovered that typing in a language other than English is another challenge that users face today. “As a result, many users search in English even if they really prefer to see results in a local language they understand,” the company said.
To meet this challenge, the search will start showing relevant content in supported Indian languages, if applicable, even if the local language query is entered in English. The feature, which the company plans to roll out over the next month, supports five Indian languages: Hindi, Bengali, Marathi, Tamil, and Telugu.
Google also allows users to quickly change the preferred language in which they see results in an app without changing the device’s language settings. The feature, currently available in Discover and Google Assistant, will now be rolled out in Maps. Maps supports nine Indian languages.
Likewise, Google Lens’s Homework feature, which allows users to take a photo of a math or science problem, then provide its answer and guide students through the steps to get there, now supports language. Hindi. India is the biggest market for Google Lens, said Nidhi Gupta, senior product manager at Google India, at the event.
Jayanth Kolla, chief analyst at consultancy firm Convergence Catalyst, said the new Google Lens feature could pose a threat to some Indian startups such as Doubtnut in Capital Sequoia, which operates in a similar space.
Google executives also detailed a new linguistic AI model, which they call Multilingual representations for Indian languages (MuRIL), which offers more efficiency and precision in handling transliteration, spelling variations and mixed languages and other nuances of languages. MuRIL supports transliterated text when writing in Hindi using Roman script, which previous models of the genre lacked, Google Research India researcher Partha Talukdar told a virtual event Thursday.
The company said it trained the new model with articles on Wikipedia and text from a dataset called Common Crawl. He also trained him on transliterated text from, among other sources, Wikipedia (powered by Google’s existing neural machine translation models). The result is that MuRIL handles Indian languages better than previous more general language models and can handle letters and words that have been transliterated – that is, Google uses the closest matching letters to an alphabet. or a different script.
Talukdar noted that the previous model Google relied on turned out to be non-scalable because the company had to create models for each language separately. “Creating such language-specific modeling for each task is not resource efficient, as we often don’t have training data for tasks like this,” he said. MuRIL significantly outperforms the previous model: 10% on native text and 27% on transliterated text. MuRIL, which was developed by Google executives in India and has been in use for about a year, is now open source.
MuRIL supports 16 Indian languages and English. Image: Google
One of the many tasks MuRIL is good at is determining the sentiment of the sentence. For example, “Achha hua account bandh nahi hua” was previously interpreted as having a negative meaning, but MuRIL correctly identifies it as a positive statement, Talukdar said. Or take the ability to classify a person in relation to a place: “Shirdi ke sai baba” would previously be interpreted as a place, which is wrong, but MuRIL correctly interprets it as a person.