Tea and Cake

The adventures of a small spotted skunk.

Entries tagged “python”

A different approach to internationalisation

written by pomke, on Mar 22, 2011 6:13:00 PM.

update: I have started a project to implement this over at github.
I’d like to prefix this entire post with two important facts: Firstly and regretfully, I only speak English. Language was not something that kept my attention in my early education and I live in a country which while being considered very multi-cultural, has only one common language which is spoken by almost the entire population (en-AU). Secondly, I do not consider myself an expert in internationalisation or localisation tools and this post is entirely based on my own experiences in developing FOSS-based web applications in a combination of commercial and community environments over the last decade.

A complaint about GNU Gettext

Internationalisation of FOSS-dependant web applications often relies on GNU Gettext. Gettext is a handy tool but seems to be based on the suggestion that there is always a default locale (usually en-US), embedded in the software which needs to be translated into other languages.

While it is often the case that an application will have been developed first in American English, many applications today are moving internationalisation right up to the forefront as a first class consideration for application development, and in many instances (especially within distributed FOSS projects) the development team can consist of members with diverse language backgrounds.

A failed ‘solution’

The reliance on a default translation which is intrinsic in the design of GNU gettext is something that has niggled at me for many years, and in past projects I have attempted to subvert this principle by replacing the usual string of default text with a token, albeit still in English (regretfully the only language I know).

_(“Thank you for completing this survey!”)

becomes:

_(“SURVEY_COMPLETION_MESSAGE”)

This has several major benefits right up front:

  • ALL translations now come from a .po file, the default locale is no longer an exception to the rule.
  • Msgid’s in the .po files are now tokens which are unlikely to change. An issue arises in the traditional model when the default locale, embedded in the source needs to be modified and the msgid’s no longer match the translation (this happens more often than you’d think).
  • Missing translations become blatantly obvious when your UI has SURVEY_COMPLETION_MESSAGE blazed across the screen.

It also has some major disadvantages:

  • It is not immediately obvious from the msgid what the intent behind the message should be.
  • Msgid’s can no longer contain placeholders, which results in text being split into strange fragments within the code:
    _(“Thank you %s for completing our survey”) % (user.firstname,)
     

    becomes:

    “%s %s %s” % (_(“SURVEY_COMPLETION_MESSAGE_1”), 
                  user.firstname, _
                  _(“SURVEY_COMPLETION_MESSAGE_2”))
     

In the long term the complexities of maintaining tokens with a standard gettext solution outweighed the benefits and added to confusion across the project I tried this in.

What next?

So where does this this leave my token (pun intended) attempts at fixing my niggles with gettext? I am currently working on two new projects of my own which are python/web applications and I have the flexibility to pick and choose the libraries I use. I have been considering abandoning gettext all together.

I would like to try out a method for combining tokens with example text and qualitative descriptions of available substitutions to assist in making a proper translation, by providing useful tools for translating an application in situ.

Firstly, I would like to be able to provide as much flexibility to the translator as possible. Some translations may require more or less formal information depending on context, it should be clear what substitutions are available to the translator:

_(“SURVEY_COMPLETION_MESSAGE”, “Thank you {{firstname}} for completing our survey.”,
{“firstname” : user.firstname, “lastname” : user.lastname, “title” : user.title})

Thank you Melanie for completing our survey.

Sometimes as a developer you do not have any placeholder text at all, this may be because the content is still being constructed by your content team. In such a case specifying a number of words to include from lorem ipsum might be appropriate, with all values substituted in place as an example gives you an immediate sense of page layout, whilst providing a feature complete bit of software just waiting for content:

_(“LOGIN_MESSAGE”, 45, {“username” : user.username, “firstname” : user.firstname }

Lorem ipsum dolor sit amet, consectetur Melanie elit. Donec fermentum 
rhoncus neque ut ornare. In ac sollicitudin est. Ut gravida urna quis neque 
Pomke sit amet luctus tortor molestie. Maecenas sem quam, porttitor vitae 
porttitor a, euismod a neque. Stebbing pharetra imperdiet augue in rutrum.

Initially this seems like a lot of extra work for little return, how exactly does this help the translator? Consider a few examples of alternate translations (please excuse my abuse of google translations):

“Go raibh maith agat as comhlánú ár suirbhé {{title}} {{lastname}}”

“Félicitations pour la fin de notre enquête {{firstname}}”

Already a translator can make use of a more flexible list of substitutions to translate in a language-appropriate manner.

Given that I am a web designer/developer/whatever you would like to call me, I would be implementing a javascript/html5/client-side storage?/buzzwords editor that could be enabled on a page to allow translating the content in the context it would be delivered in.

Essentially, integration with a templating engine/framework would provide an l10n-mode which could be turned on during development which would output html like:

<div class=”_i18n_text” id=”LOGIN_MESSAGE”>Lorem ipsum dolor sit amet, 
consectetur Melanie elit. Donec fermentum rhoncus neque ut ornare. In ac 
sollicitudin est. Ut gravida urna quis neque Pomke sit amet luctus tortor 
molestie. Maecenas sem quam, porttitor vitae porttitor a, euismod a neque. 
Stebbing pharetra imperdiet augue in rutrum.</div>

And would also embed a javascript client which would tag these strings with a floating [translate] button which would launch a translate tool. This tool would display any default explanatory text, possible substitutions and allow the translator to easily provide translations in various languages and see the results reflected immediately on the page. The translations could be written to the server via a simple API, or alternately stored on the client in client-side storage for uploading at a later date.

I thought that before I launched into writing this I’d throw the idea out to you, my friends and peers for comment. Am I missing some integral part of the GNU gettext api that provides these features already? are there projects out there using tools which already solve these problems? What experiences have you had in this area? Please leave comments and let me know what you think.

Best Wishes,

Pomke

Merge Merge Revolution!

written by pomke, on Jan 26, 2011 11:01:00 PM.

At work we use Git for managing our source and Buildbot for tracking our tests, be they unit, functional, acceptance, or simply lint-like tests such as pyflakes. The development life-cycle I have implemented which is serving us well is modelled on the Ultimate Quality Development System, Basically;

  1. A ticket is raised in the ticket system (in our case rally).
  2. The ticket is accepted into a sprint.
  3. A branch is created, and the work is done
  4. The branch is peer-reviewed and feedback is given
  5. The branch is accepted by the business*
  6. The branch is merged back into master
..Fairly straight forward, *I grouped QA/AT/etc into ‘accepted by the business’, being a large organisation we have some extra hoops to jump through which I won’t bore you with.

As part of the peer review process we have our Definition of Done which includes items such as unit test coverage, functional test coverage, coding standard compliance, code lint and in the future scalability regression testing and other stuff.

Now the idea of the peer review is to communicate the change to another developer, to get some new eyes onto the change and be critically reviewed. Yet ensuring the DoD is met has become the focus of the exercise for us, detracting from what should really be going on in a peer review which is sad, especially given much of these tasks should really be automated anyway.

Ontop of this (and I won’t go into my opinions on off-shoring here) our management have seen fit to bring in some ‘programmers for hire’, who will be working from an entirely different location (and timezone), and doing their own peer review, so I decided it was time to both take away the mundane tasks from the peer review sessions and also ensure that our DoD would be met, irrespective of who was making the change.



Merge Merge Revolution! took about three days to glue together. The tools I used were Flask (Yes I know I have my own werkzeug/jina2/etc framework but I’m not sharing that with work at this moment as it is yet to be licenced c_c), Sqlite, My very own pomke.js, and jQuery.

MMR combines Git via the commandline and Buildbot via it’s json API to provide a (hopefully intuitive) interface which automatically populates buildbot with all –no-merged branches off master and a mechanism to merge those branches only when buildbot passes.

Combined these things provide a form of continuous integration, from the one location a developer can quickly identify if their branch is passing (you can click a build for a breakdown of steps, see above), force a build (the steps/eta/% done of which are updated in realtime on the screen), or attempt a merge, all the while with the assurance that nothing can be merged back into master unless the DoD is complete which means all forms of tests are passing; adequate coverage exists for the changed code, all forms of code checking have been run.

Further to this, we have a build-step which checks if the branch needs to be merged-forward, that is, another branch has been committed to master since the current branch was last pulled.

Here is a basic list of what I would like to see run before MMR allows a merge (we have about 80% of these steps already):
  • Unit tests
  • Functional tests
  • Code lint
  • Test coverage
  • i18n string coverage
  • Scalability regression tests
  • Branch tracking master


I still need to get approval to release this tool on GitHub under BSD license as I built it on work’s time, but once that minor bit of red-tape is out of the way I’d like to make MMR a bit more modular, providing a basic interface so it can be used with any SCM and release it into the wild.

Would this be useful to you at all? Please let me know.

Lots of Love, Pomke Nohkan.

WSGI path magic

written by pomke, on Oct 17, 2010 8:12:00 PM.

It seems that every python web framework has some sort of helper for dealing with paths. You know, the kind of helper you’d use doing a re-direct after handling a form submission. They often take the form of a complex class with __getattribute__, __getitem__ and __call__ overridden, with API’s along the lines of
path.up().child('flibble').child('bop')
or
path[:-2]('/foo/bar')
or any number of painful syntax hacks. I’ve seen several that won’t even traverse upward!

I caught myself going down the same path (har har) today with skunkpad, writing a wrapper around the request.path that would allow slicing and extending, much like the second example above. After half an hour of this I decided there had to be a simpler way, and it turns out there was. Here is my one line path munging helper:

from os.path import join, normpath        

...

request.relative = lambda p: normpath(join(request.path, p)).replace("\\","/")


And there you have it, the replace is to handle normpath on windows using ‘\’ rather than ‘/’.
request.relative("..")
request.relative("/foo/bar")
request.relative("../baz/bop")
go nuts ^_^

- Pomke