Monday, 5 November 2012

The Language of Content

According to Wikipedia:

“India is home to several hundred languages. Most Indians speak a language belonging either to:
  • the Indo-European (ca. 74%)
  • the Dravidian (ca. 24%)
  • the Austroasiatic (Munda) (ca. 1.2%)
  • the Tibeto-Burman (ca. 0.6%) families, with some languages of the Himalayas still unclassified
The SIL Ethnologue lists 415 living languages for India.”

The table below lists India’s languages based on the % of native speaker:

Rank Language % of Population who are Native Speakers
1 Hindi 41.0%
2 Bengali 8.1%
3 Telugu 7.2%
. . .
. . .
. . .
42 English 0.027%

Even Hindi has a number of different dialects.

The variety of languages in the country is probably one of the biggest reasons it is difficult to find digital content in different regional languages.  And this creates a number of issues for us. 

The biggest one being that as we think about rolling out the project across the country, we struggle to launch in some states because of the language barrier.  For the 1st phase, we started with English content, which limited us to:
  1. Going  to private schools as most government schools are not English medium
  2. Having a larger urban presence because outside the larger cities, finding English medium private schools is difficult
The Hindi Belt
For the next phase, we are in the process of creating Hindi content.  But even with that, we have to concentrate in the northern states or the Hindi Belt, as highlighted in red in the map to the right.

This language restriction automatically excludes a large part of the country for us. 

Creating digital content in various languages is expensive.  As we move ahead in the future, it is important that more localized digital content is created because as the infrastructure improves worldwide, not just in India, the need for relatable content will be crucial for uptake and impact.