The Invent with Python Blog

Sat 20 December 2014

Translate Your Python 3 Program with the gettext Module

Posted by Al Sweigart in python   

You've written a Python 3 program and want to make it available in other languages. You could duplicate the entire code-base, then go painstakingly through each .py file and replace any text strings you find. But this would mean you have two separate copies of your code, which doubles your workload every time you need to make a change or fix a bug. And if you want your program in other languages, it gets even worse.

Fortunately, Python provides a solution with the gettext module.

A Hack Solution

You could hack together your own solution. For example, you could replace every string in your program with a function call (with the function name being something simple, like _())) which will return the string translated into the correct language. For example, if your program was:

print('Hello world!')

...you could change this to:

print(_('Hello world!'))

...and the _() function could return the translation for 'Hello world!' based on what language setting the program had. For example, if the language setting was stored in a global variable named LANGUAGE, the _() function could look like this:

def _(s):
    spanishStrings = {'Hello world!': 'Hola Mundo!'}
    frenchStrings = {'Hello world!': 'Bonjour le monde!'}
    germanStrings = {'Hello world!': 'Hallo Welt!'}

    if LANGUAGE == 'English':
        return s
    if LANGUAGE == 'Spanish':
        return spanishStrings[s]
    if LANGUAGE == 'French':
        return frenchStrings[s]
    if LANGUAGE == 'German':
        return germanStrings[s]

This would work, but you'd be reinventing the wheel. This is pretty much what Python's gettext module does. gettext is a set of tools and file formats created in the early 1990s to standardize software internationalization (also called I18N). gettext was designed as a system for all programming languages, but we'll focus on Python in this article.

The Example Program

Say you have a simple "Guess the Number" game written in Python 3 that you want to translate. The source code to this program is here. There are four steps to internationalizing this program:

  1. Modify the .py file's source code so that the strings are passed to a function named _().
  2. Use the pygettext.py script that comes installed with Python to create a "pot" file from the source code.
  3. Use the free cross-platform Poedit software to create the .po and .mo files from the pot file.
  4. Modify your .py file's source code again to import the gettext module and set up the language setting.

Step 1: Add the _() Function

First, go through all of the strings in your program that will need to be translated and replace them with _() calls. The gettext system for Python uses _() as the generic name for getting the translated string since it is a short name.

Note that using string formatting instead of string concatenation will make your program easier to translate. For example, using string concatenation your program would have to look like this:

print('Good job, ' + myName + '! You guessed my number in ' + guessesTaken + ' guesses!')

print(_('Good job, ') + myName + _('! You guessed my number in ') + guessesTaken + _(' guesses!'))

This results in three separate strings that need to be translated, as opposed to the single string needed in the string formatting approach:

print('Good job, %s! You guessed my number in %s guesses!' % (myName, guessesTaken))

print(_('Good job, %s! You guessed my number in %s guesses!') % (myName, guessesTaken))

When you've gone through the "Guess the Number" source code, it will look like this. You won't be able to run this program since the _() function is undefined. This change is just so that the pygettext.py script can find all the strings that need to be translated.

Step 2: Extract the Strings Using pygettext.py

In the Tools/i18n of your Python installation (C:\Python34\Tools\i18n on Windows) is the pygettext.py script. While the normal gettext unix command parse C/C++ source code for translatable strings and the xgettext unix command can parse other languages, pygettext.py knows how to parse Python source code. It will find all of these strings and produce a "pot" file.

On Windows, I've run this script like so:

C:\>py -3.4 C:\Python34\Tools\i18n\pygettext.py -d guess guess.py

This creates a pot file named guess.pot. This is just a normal plaintext file that lists all the translated strings it found in the source code by search for _() calls. You can view the guess.pot file here.

Step 3: Translate the Strings using Poedit

You could fill in the translation using a text editor, but the free Poedit software makes it easier. Download it from http://poedit.net. Select File > New from POT/PO file... and select your guess.po file.

Poedit will ask what language you want to translate the strings to. For this example, we'll use Spanish:

Then fill in the translations. (I'm using http://translate.google.com, so it probably sounds a bit odd to actual Spanish-speakers.)

And now save the file in it's gettext-formatted folder. Saving will create the .po file (a human-readable text file identical to the original .pot file, except with the Spanish translations) and a .mo file (a machine-readable version which the gettext module will read. These files have to be saved in a certain folder structure for gettext to be able to find them. They look like this (say I have "es" Spanish files and "de" German files):

./guess.py
./guess.pot
./locale/es/LC_MESSAGES/guess.mo
./locale/es/LC_MESSAGES/guess.po
./locale/de/LC_MESSAGES/guess.mo
./locale/de/LC_MESSAGES/guess.po

These two-character language names like "es" for Spanish and "de" for German are called ISO 639-1 codes and are standard abbreviations for languages. You don't have to use them, but it makes sense to follow that naming standard.

Step 4: Add gettext Code to Your Program

Now that you have the .mo file that contains the translations, modify your Python script to use it. Add the following to your program:

import gettext
es = gettext.translation('guess', localedir='locale', languages=['es'])
es.install()

The first argument 'guess' is the "domain", which basically means the "guess" part of the guess.mo filename. The localedir is the directory location of the locale folder you created. This can be either a relative or absolute path. The 'es' string describes the folder under the locale folder. The LC_MESSAGES folder is a standard name

The install() method will cause all the _() calls to return the Spanish translated string. If you want to go back to the original English, just assign a lambda function value to _ that returns the string it was passed:

import gettext
es = gettext.translation('guess', localedir='locale', languages=['es'])
print(_('Hello! What is your name?'))  # prints Spanish

_ = lambda s: s

print(_('Hello! What is your name?')) # prints English

You can view the translation-ready source code for the "Guess the Number". If you want to run this program, download and unzip this zip file with it's locale folders and .mo file set up.

Further Reading

I am by no means an expert on I18N or gettext, and please leave comments if I'm breaking any best practices in this tutorial. Most of the time your software will not switch languages while it's running, and instead read one of the LANGUAGE, LC_ALL, LC_MESSAGES, and LANG environment variables to figure out the locale of the computer it's running on. I'll update this tutorial as I learn more.

Comments