Picture yourself strolling around a library, or local bookstore, shelves packed with tons of fascinating Hebrew books, but all the covers feature tongue-twister Latin transcriptions of the titles. For example, Mishehu Larutz Ito, Hatzotzra BaWadi, or Sipur Al Ahava Vechoshech.
You’ve probably already thought of several other alternative transcriptions of the word Hatzotzra (trumpet), but what about words like Hilazon (snail), Mitriya (umbrella), or Agvaniyot (tomatoes)?
What does this story have to do with the web?
Every device connected to the internet has a unique address consisting of numbers, which allows other devices to reach it. That’s how computers, tablets, smartphones, etc., communicate across the internet. Due to the huge number of connected devices, each is assigned a long string (up to 12 digits) known as IP (Internet Protocol) address, for example, 192.115.211.45.
Humans find it hard to remember these long number-based addresses, so a DNS (Domain Name System) maps them to standard alphabetic names that we can understand and memorize. These are domain names (for example, isoc.org.il or איגוד-האינטרנט.ישראל).[1]
A short history of domain names
The origins of domain names – a core part of the internet – are intertwined with the web. The internet is an evolution of ARPANET (Advanced Research Projects Agency Network), developed in the United States in the 1960s.[2] Given the American origin, domain names were first written in Latin letters (a-z), joined by digits (0–9) and a hyphen (-) that allow diverse naming variations. The representation of these characters in computer language is known as ASCII.
But the web has since expanded, including to non-English speaking regions, where the command of the Latin writing system is low or limited. This restricted speakers of other languages from being equal rights netizens.
Technical standards that support non-Latin domain names were developed and adopted in the mid-1990s. Such domains are called IDN (Internationalized Domain Names). Implementation of IDNs began in 2000 at the second level (under .com and .net) and 2001 (.jp). In the ten years that followed, several ccTLDs deployed IDNs, primarily supporting local language character sets. Punycode-based IDN technology has proven to be the most successful mechanism.
What is Punycode?
Punycode is an algorithm that converts a Unicode string into an ASCII string. The converted ASCII string is prefixed with “xn--” to indicate that the domain name is a Punycode translation of a Unicode string. The result is called A-Label or ACE (ASCII Compatible Encoding), which the DNS can interpret.
For more information, see section 2.3 in RFC 5890.
This allows the conversion of any character written in Unicode encoding (unrecognized by the DNS) into the supported ASCII encoding. For example, any Hebrew text (here, a series of non-ASCII characters) will be converted into a string of standard ASCII characters prefixed with “xn--“. The result might look gibberish, but it allows us to write domains in various languages while the domain name conversion mechanism remains fundamentally unchanged.[3]
Worldwide IDN deployment
Hybrid IDNs – domain names that mix Unicode SLD and ASCII TLD (for example, איגודהאינטרנט.co.il) – have been available for registration for almost two decades. They accommodate Latin-based scripts, where the IDN element usually reflects diacritical marks. In German, for example, the word bücher fits elegantly and conveniently into a hybrid IDN (for example, bücher.com).
But it’s less than optimal for non-Latin scripts, especially those written from right to left (including Hebrew, Arabic, and Chinese). The solution was a bidirectional domain name that assumes users know their local language and are familiar with Latin characters. This format also requires people to switch languages while typing a single web address and might confuse the strict hierarchy of the DNS.[3]
In 2007–2008, the ccTLD community convinced ICANN (the Internet Corporation for Assigned Names and Numbers) to introduce a fast-track process for creating IDN ccTLDs. The next milestone was in 2010, when ICANN approved four new IDN ccTLDs in Egypt, Saudi Arabia, the Russian Federation, and the United Arab Emirates, introducing support for domains written in local scripts, such as .مصر or .рф web addresses.[3]
Since the adoption of the IDN standard, registration has significantly expanded. As of 2021, there are approximately 8.6 million listed IDNs,[4] with China leading the way, followed by Russia and Germany.[5]
The annual number of IDNs (source: IDN World Report)
IDN as a promoter of a multilingual internet
Many consider IDN as a catalyst of a multilingual internet. According to UNESCO’s data, English was the dominant language on the web in 2008 (72% of online content), and only 12 languages accounted for 98% of all web pages. Recent studies show an increase in other languages: In 2010, for example, 20% of the articles on Wikipedia were in English, but in December 2018, English dropped to less than 12% of all content in the online encyclopedia.[3]
IDN supporters believe that allowing users to navigate and browse the web in their native languages will increase linguistic diversity among netizens. The IDN World Report confirms a strong relationship between IDN and local content written in different languages. The graph below shows that websites with IDNs feature more content in their respective languages than those with a Latin domain, where more than half of the content is in English. The top graph compares these two figures to the distribution of speakers of various languages among the world’s general population.
IDN and linguistic diversity (source: IDN World Report)
Making the web more accessible to speakers of different languages leads to a more inclusive and diverse community of internet users, with all the benefits it brings: Access to online commerce, extensive business opportunities, connection to local communities, as well as support and preservation of local cultures and traditions through language. The Universal Acceptance Steering Group’s (UASG) 2017 study [6] shows that fully implementing IDN corresponds to $9.8 billion in business opportunities, and that’s a conservative estimate. Namely, the businesses that will adopt IDN will be strategically positioned to attract diverse audiences – international and local – and maximize potential revenue from current internet users and the next billion expected to join.
In a survey conducted in Israel by the Central Bureau of Statistics (CBS)[7], 12%-13% of the respondents couldn’t speak, read, or write in English. The survey also revealed an English literacy gap among Arabs and Jews: 28% of the Arabic population said they don’t speak English, compared to 8% among the Jewish population. 19% of ultra-Orthodox also said they speak no English, compared to only 5% among secular Jews. The survey also revealed that young educated people have a higher English literacy than older people or those without a high school or higher education.
These findings suggest that Hebrew domain names can help reduce the digital gap in Israel, making the web accessible to broader segments of the population whose level of English literacy is low. This can open up employment opportunities, help people integrate into emerging digital communities, and benefit from other online advantages.
The remaining technological challenges
Although IDN has been supported for over 20 years, some technical challenges have yet to be resolved. Many platforms and apps still struggle to recognize IDN as valid domains, including email software. Some apps are outdated and don’t support the latest standards, but even new services sometimes fail because awareness among developers is low. Furthermore, there are technical difficulties, like variants in different languages (for example, the Arabic letters ا and أ).[8]
Universal Acceptance refers to the ideal situation where all apps and services support all domain names and email addresses in every language.
Several international groups advocate for this vision of global compatibility among the international internet community, including UASG and IDN World Report. These forums investigate the interoperability of online infrastructures and promote Universal Acceptance by raising awareness among service providers and software developers (browser companies, email providers, programming language developers, etc.). Some of these initiatives are funded by ICANN, which recently signed a memorandum with the European Registry (EURid) to cooperate in promoting IDN and non-Latin languages online.
Browsers
UASG’s tests of popular browsers running on various operating systems show that most support IDN-based display and navigation. Some right-to-left domains, such as Hebrew and Arabic, were displayed incorrectly, but this may be resolved by changing the display settings.
Most browsers also support IDN bookmarks, but some store the domain in less-friendlier Punycode format (the converted address with the xn-- prefix). The report also suggests that mobile browsers offer limited support compared to their desktop counterparts.
In this context, we should note that mixing characters from different languages also facilitates domain spoofing (deliberately misleading and deceiving users). This issue is less relevant to Hebrew and more common in scripts that share letters with other writing systems, such as Latin and Cyrillic. For example, the Cyrillic letter Er is the same as the Latin letter P, so a malicious actor might use this character to spoof a legitimate domain (like apple.com) and direct unsuspecting users to a fraudulent website.
Luckily, modern browsers protect users via various automatic domain validation methods, including a mechanism that verifies that the domain doesn’t contain characters from different writing systems. A browser could display a suspicious domain in Punycode or issue a warning message alerting the user before visiting the spoofed website. [9][10]
Finally, in the age of IoT (Internet of Things), when a multitude of new devices connect to the internet, browsers are now embedded in cars, watches, and household appliances. These might not support IDN as well as desktop or mobile browsers.[11]
For an overview of the findings, including the details of the services that support IDN, see the UASG report.
Email addresses written in non-Latin scripts are based on EAI (Email Address Internationalization). Two main challenges limit wide-scale adoption:
- On the client side (the email app) – the software should be able to display, process, and store international addresses. For example, presenting an EAI address in Unicode characters but transmitting the domain name to the mail server in Punycode.
- On the server side – the software should support EAI and allow transmission in a way that retains the EAI format.
Therefore, even if one email software supports sending and receiving to and from a Hebrew domain, it might not be possible when using another app, depending on the software.[12]
For an overview of the findings, including the details of the services that support EAI, see the UASG report (slide #23).
Social Networks and messaging apps
UASG’s latest findings indicate that none of the social media platforms and messaging apps allows users to sign up using EAI. A slightly more optimistic (albeit complex) picture emerges when examining support for clickable IDN links: The report suggests that Facebook, Instagram, WhatsApp, YouTube, LinkedIn, Telegram, and TikTok offer at least partial support of IDN hyperlinks that direct users to the correct address, with Telegram offering the most comprehensive support of IDN, followed closely by LinkedIn, Facebook, and Twitter.
UASG encountered disparities in IDN support between mobile and desktop apps, with the platforms’ mobile apps offering better support. For an overview of the findings, including the details of the services that support IDN, see the UASG report (slide #6).
The UASG report also details the compatibility of programming languages (C, C#, Java, Python, Rust, and others), frameworks, and development environments on Linux, Windows, iOS, and Android.
Sources
[1] What is a Domain Name? https://www.isoc.org.il/domain-name-registry/domain-name
[2] ARPANET, Wikipedia https://he.wikipedia.org/wiki/ARPANET
[3] Introduction, IDN World Reportי https://idnworldreport.eu/about/introduction/
[4] IDN totals by year https://idnworldreport.eu/charts/idn-numbers/idn-totals-by-year
[5] Top 20 IDN spaces
https://idnworldreport.eu/charts/idn-numbers/top-20-idn-spaces/
[6] UASG 038 Universal Acceptance (UA) Messaging for Social Relevancy, Business Opportunities and Career Opportunities EN:י https://uasg.tech/download/uasg-038-universal-acceptance-ua-messaging-for-social-relevancy-business-opportunities-and-career-opportunities/
[7] Level of Command of the English Language in Israel
Selected Data from the Israeli Survey of Adult Skills (PIAAC), 2014-2015 https://www.cbs.gov.il/he/mediarelease/DocLib/2017/308/06_17_308b.pdf
[8] Internationalized Domain Names (IDNs) Where Are We Now? (ICANN): https://www.icann.org/en/system/files/files/idns-where-are-we-now-16jun21-en.pdf
[9] Internationalized Domain Names (IDN) in Google Chrome:
https://chromium.googlesource.com/chromium/src/+/main/docs/idn.md
[10] IDN Display Algorithm:
https://wiki.mozilla.org/IDN_Display_Algorithm
[11] Universal Acceptance – Browsers:י https://idnworldreport.eu/universal-acceptance/browsers/
[12] Universal Acceptance – Email:י https://idnworldreport.eu/universal-acceptance/email/
Further References
- UASG 036A UA-Readiness of Browsers EN:י https://uasg.tech/download/uasg-036a-ua-readiness-of-browsers-en/
- Configuring for Internationalized Email Addresses (EAI):י https://community.icann.org/download/attachments/171835736/NARALO-ICANN%20EAI%20Training%20-20220127_Sarmad_2.pdf?version=2&modificationDate=1643304634000&api=v2
- UASG 035A UA Readiness of Social Media Platforms EN:ייhttps://uasg.tech/download/uasg-035a-ua-readiness-of-social-media-platforms-en/
- UASG 037A UA-Readiness of Some Programming Language Libraries and Frameworks EN:י https://uasg.tech/download/uasg-037a-ua-readiness-of-some-programming-language-libraries-and-frameworks-en/