Tea and Cake

The adventures of a small spotted skunk.

Entries tagged “l10n”

A different approach to internationalisation

written by pomke, on Mar 22, 2011 6:13:00 PM.

update: I have started a project to implement this over at github.
I’d like to prefix this entire post with two important facts: Firstly and regretfully, I only speak English. Language was not something that kept my attention in my early education and I live in a country which while being considered very multi-cultural, has only one common language which is spoken by almost the entire population (en-AU). Secondly, I do not consider myself an expert in internationalisation or localisation tools and this post is entirely based on my own experiences in developing FOSS-based web applications in a combination of commercial and community environments over the last decade.

A complaint about GNU Gettext

Internationalisation of FOSS-dependant web applications often relies on GNU Gettext. Gettext is a handy tool but seems to be based on the suggestion that there is always a default locale (usually en-US), embedded in the software which needs to be translated into other languages.

While it is often the case that an application will have been developed first in American English, many applications today are moving internationalisation right up to the forefront as a first class consideration for application development, and in many instances (especially within distributed FOSS projects) the development team can consist of members with diverse language backgrounds.

A failed ‘solution’

The reliance on a default translation which is intrinsic in the design of GNU gettext is something that has niggled at me for many years, and in past projects I have attempted to subvert this principle by replacing the usual string of default text with a token, albeit still in English (regretfully the only language I know).

_(“Thank you for completing this survey!”)

becomes:

_(“SURVEY_COMPLETION_MESSAGE”)

This has several major benefits right up front:

  • ALL translations now come from a .po file, the default locale is no longer an exception to the rule.
  • Msgid’s in the .po files are now tokens which are unlikely to change. An issue arises in the traditional model when the default locale, embedded in the source needs to be modified and the msgid’s no longer match the translation (this happens more often than you’d think).
  • Missing translations become blatantly obvious when your UI has SURVEY_COMPLETION_MESSAGE blazed across the screen.

It also has some major disadvantages:

  • It is not immediately obvious from the msgid what the intent behind the message should be.
  • Msgid’s can no longer contain placeholders, which results in text being split into strange fragments within the code:
    _(“Thank you %s for completing our survey”) % (user.firstname,)
     

    becomes:

    “%s %s %s” % (_(“SURVEY_COMPLETION_MESSAGE_1”), 
                  user.firstname, _
                  _(“SURVEY_COMPLETION_MESSAGE_2”))
     

In the long term the complexities of maintaining tokens with a standard gettext solution outweighed the benefits and added to confusion across the project I tried this in.

What next?

So where does this this leave my token (pun intended) attempts at fixing my niggles with gettext? I am currently working on two new projects of my own which are python/web applications and I have the flexibility to pick and choose the libraries I use. I have been considering abandoning gettext all together.

I would like to try out a method for combining tokens with example text and qualitative descriptions of available substitutions to assist in making a proper translation, by providing useful tools for translating an application in situ.

Firstly, I would like to be able to provide as much flexibility to the translator as possible. Some translations may require more or less formal information depending on context, it should be clear what substitutions are available to the translator:

_(“SURVEY_COMPLETION_MESSAGE”, “Thank you {{firstname}} for completing our survey.”,
{“firstname” : user.firstname, “lastname” : user.lastname, “title” : user.title})

Thank you Melanie for completing our survey.

Sometimes as a developer you do not have any placeholder text at all, this may be because the content is still being constructed by your content team. In such a case specifying a number of words to include from lorem ipsum might be appropriate, with all values substituted in place as an example gives you an immediate sense of page layout, whilst providing a feature complete bit of software just waiting for content:

_(“LOGIN_MESSAGE”, 45, {“username” : user.username, “firstname” : user.firstname }

Lorem ipsum dolor sit amet, consectetur Melanie elit. Donec fermentum 
rhoncus neque ut ornare. In ac sollicitudin est. Ut gravida urna quis neque 
Pomke sit amet luctus tortor molestie. Maecenas sem quam, porttitor vitae 
porttitor a, euismod a neque. Stebbing pharetra imperdiet augue in rutrum.

Initially this seems like a lot of extra work for little return, how exactly does this help the translator? Consider a few examples of alternate translations (please excuse my abuse of google translations):

“Go raibh maith agat as comhlánú ár suirbhé {{title}} {{lastname}}”

“Félicitations pour la fin de notre enquête {{firstname}}”

Already a translator can make use of a more flexible list of substitutions to translate in a language-appropriate manner.

Given that I am a web designer/developer/whatever you would like to call me, I would be implementing a javascript/html5/client-side storage?/buzzwords editor that could be enabled on a page to allow translating the content in the context it would be delivered in.

Essentially, integration with a templating engine/framework would provide an l10n-mode which could be turned on during development which would output html like:

<div class=”_i18n_text” id=”LOGIN_MESSAGE”>Lorem ipsum dolor sit amet, 
consectetur Melanie elit. Donec fermentum rhoncus neque ut ornare. In ac 
sollicitudin est. Ut gravida urna quis neque Pomke sit amet luctus tortor 
molestie. Maecenas sem quam, porttitor vitae porttitor a, euismod a neque. 
Stebbing pharetra imperdiet augue in rutrum.</div>

And would also embed a javascript client which would tag these strings with a floating [translate] button which would launch a translate tool. This tool would display any default explanatory text, possible substitutions and allow the translator to easily provide translations in various languages and see the results reflected immediately on the page. The translations could be written to the server via a simple API, or alternately stored on the client in client-side storage for uploading at a later date.

I thought that before I launched into writing this I’d throw the idea out to you, my friends and peers for comment. Am I missing some integral part of the GNU gettext api that provides these features already? are there projects out there using tools which already solve these problems? What experiences have you had in this area? Please leave comments and let me know what you think.

Best Wishes,

Pomke