Unicode Devnagari Issues: January 2010

Adding more mud to the water!
देवनागरी is such a versatile script that perhaps no other Indian script is!

This reminds me an very old Cartoon Film of 4 blinds fighting over how an elephant looks like! Each one is correct and truthful to themselves but depending on where they are touching the elephant one finds Elephant as a pillar (One who is Touching Leg) and other one like a Rope (Who is touching Elephant's Tail). Once their blind-folds are removed, they are able to see the entire Elephant and admire why other people were saying what they were saying!

Many know that देवनागरी is used to scribe languages like Marathi, Hindi and Sanskrit.

However, many do not know that it is also used to scribe Nepali, Kokani, Sindhi and many other little known languages!

The Demographics of each of these language is different and so is usage of certain characters or symbols of Devanagari.

This leads to make a single authoritative decision making entity impossible.

Statistically, the biggest user groups of देवनागरी belong to Hindi and Marathi Community amongst all. But they spend energy in fighting on an issue which cannot be resolved because both the parties are talking Truth according to THEIR usage of the script and language.

Just to give an example the character ज्ञ is pronounced differently in Hindi than in Marathi. I have no clue about how it is pronounced in other languages (Nepali, Sindhi, etc). Moreover, In Marathi, this character is a valid consonant whereas in Hindi it is regarded as a Joint Letter.
Of course, this is according to my poor man's knowledge and different people have different opinions about it.

In spite of all the controversies, Unicode does not have a seperate code to express this character. It is always expressed as a Joint letters ज combined with ञ. Surprizingly the SOUND of this character in Hindi is different than in Marathi and is nowhere close to individual pronunciation of these characters. in Hindi, it may be better described as a joint letters ग्य as in विज्ञापन

Similar is the case with this controversial character ऍ. This is only a character that is used in Hindi language. Perhaps to express Foreign language words like APPLE. However, this is not a valid Marathi language character. In Marathi, for the same pronunciation it is represented as अॅ. On some systems this may not display correctly, so to get an idea it is same as character ऑ without extra 'KANA'. Now in my opinion, this should have been left to the FONT Implementer rather than introducing another code! In Unicode 5.0, Now the later character has been assigned a code 0972 and in my opinion now there are 2 Unicode CODE POINTS Representing the same letter.

For e.g., in some English fonts, small 'a' has a different shape than others 'a' (Arial vs Comic sans serif) but it doesn't mean that they should have a different code numbers. both these represent the First English Letter 'a'! It is given to the Creative Freedom of a font designer how a particular character should look like.

However, for Devanagari, there is no Definitive Baseline Language that includes all of the Devanagari characters and there is no reasonable base available to confirm a character representation used in one language is equivalent of another representation in other language leading to an utter mess.

Unicode is supposed to be a standard but nowhere there is any specific guideline that I'm aware that states the qualification criteria that makes a case to have a different character code which is called as CODE POINT. Should a different look of character get an assignment or is it different "SOUND" of a character gets a new assignment! Confusing.

This gives rise to whether Unicode should have given CODE RANGES Based on SPOKEN languages or SCRIPTS.

No doubt all Indian languages are Phonetic! What is a Phonetic Language? It is faithful in making an EXACT representation of sound by the way it is SCRIPTED.
In other words, if you know how to "SOUND" a particular word, you know its "SPELLING" as well. It isn't true for English. For e.g. "DO" is spelled similar to "GO" but the pronunciation of both these words are totally different.

In almost all Indian language, it is almost predictable how a word is READ just by looking at how it is WRITTEN. In Computer terms, it is called as WYSIWYG = What You See Is What You Get

But is it 100% True for all Indian languages and for entire vocabulary?
Unfortunately NO!
I can site some examples but you can think of it also.

But the biggest mistake ever I would say for Unicode Encoding for Devanagari was basing it on the assumption that it is 100% True that led to non-standard workarounds in specific language issues.

all the half 'r' related descripancies are stemming from this particular issue.

Unicode Devnagari Issues

Many Languages One script: Devanagari! देवनागरी

Is encoding based on pronunciation a good idea?

About Me

Blog Archive