Translate Your Python 3 Program with the gettext Module

You've written a Python 3 program and want to make it available in other languages. You could duplicate the entire code-base, then go painstakingly through each .py file and replace any text strings you find. But this would mean you have two separate copies of your code, which doubles your workload every time you need to make a change or fix a bug. And if you want your program in other languages, it gets even worse.

Fortunately, Python provides a solution with the gettext module.

A Hack Solution

You could hack together your own solution. For example, you could replace every string in your program with a function call (with the function name being something simple, like _())) which will return the string translated into the correct language. For example, if your program was:

print('Hello world!')

...you could change this to:

print(_('Hello world!'))

...and the _() function could return the translation for 'Hello world!' based on what language setting the program had. For example, if the language setting was stored in a global variable named LANGUAGE, the _() function could look like this:

def _(s):
    spanishStrings = {'Hello world!': 'Hola Mundo!'}
    frenchStrings = {'Hello world!': 'Bonjour le monde!'}
    germanStrings = {'Hello world!': 'Hallo Welt!'}

    if LANGUAGE == 'English':
        return s
    if LANGUAGE == 'Spanish':
        return spanishStrings[s]
    if LANGUAGE == 'French':
        return frenchStrings[s]
    if LANGUAGE == 'German':
        return germanStrings[s]

This would work, but you'd be reinventing the wheel. This is pretty much what Python's gettext module does. gettext is a set of tools and file formats created in the early 1990s to standardize software internationalization (also called I18N). gettext was designed as a system for all programming languages, but we'll focus on Python in this article.

The Example Program

Say you have a simple "Guess the Number" game written in Python 3 that you want to translate. The source code to this program is here. There are four steps to internationalizing this program:

  1. Modify the .py file's source code so that the strings are passed to a function named _().
  2. Use the pygettext.py script that comes installed with Python to create a "pot" file from the source code.
  3. Use the free cross-platform Poedit software to create the .po and .mo files from the pot file.
  4. Modify your .py file's source code again to import the gettext module and set up the language setting.

Step 1: Add the _() Function

First, go through all of the strings in your program that will need to be translated and replace them with _() calls. The gettext system for Python uses _() as the generic name for getting the translated string since it is a short name.

Note that using string formatting instead of string concatenation will make your program easier to translate. For example, using string concatenation your program would have to look like this:

print('Good job, ' + myName + '! You guessed my number in ' + guessesTaken + ' guesses!')

print(_('Good job, ') + myName + _('! You guessed my number in ') + guessesTaken + _(' guesses!'))

This results in three separate strings that need to be translated, as opposed to the single string needed in the string formatting approach:

print('Good job, %s! You guessed my number in %s guesses!' % (myName, guessesTaken))

print(_('Good job, %s! You guessed my number in %s guesses!') % (myName, guessesTaken))

When you've gone through the "Guess the Number" source code, it will look like this. You won't be able to run this program since the _() function is undefined. This change is just so that the pygettext.py script can find all the strings that need to be translated.

Step 2: Extract the Strings Using pygettext.py

In the Tools/i18n of your Python installation (C:\Python34\Tools\i18n on Windows) is the pygettext.py script. While the normal gettext unix command parse C/C++ source code for translatable strings and the xgettext unix command can parse other languages, pygettext.py knows how to parse Python source code. It will find all of these strings and produce a "pot" file.

On Windows, I've run this script like so:

C:\>py -3.4 C:\Python34\Tools\i18n\pygettext.py -d guess guess.py

This creates a pot file named guess.pot. This is just a normal plaintext file that lists all the translated strings it found in the source code by search for _() calls. You can view the guess.pot file here.

Step 3: Translate the Strings using Poedit

You could fill in the translation using a text editor, but the free Poedit software makes it easier. Download it from http://poedit.net. Select File > New from POT/PO file... and select your guess.po file.

Poedit will ask what language you want to translate the strings to. For this example, we'll use Spanish:

Then fill in the translations. (I'm using http://translate.google.com, so it probably sounds a bit odd to actual Spanish-speakers.)

And now save the file in it's gettext-formatted folder. Saving will create the .po file (a human-readable text file identical to the original .pot file, except with the Spanish translations) and a .mo file (a machine-readable version which the gettext module will read. These files have to be saved in a certain folder structure for gettext to be able to find them. They look like this (say I have "es" Spanish files and "de" German files):

./guess.py
./guess.pot
./locale/es/LC_MESSAGES/guess.mo
./locale/es/LC_MESSAGES/guess.po
./locale/de/LC_MESSAGES/guess.mo
./locale/de/LC_MESSAGES/guess.po

These two-character language names like "es" for Spanish and "de" for German are called ISO 639-1 codes and are standard abbreviations for languages. You don't have to use them, but it makes sense to follow that naming standard.

Step 4: Add gettext Code to Your Program

Now that you have the .mo file that contains the translations, modify your Python script to use it. Add the following to your program:

import gettext
es = gettext.translation('guess', localedir='locale', languages=['es'])
es.install()

The first argument 'guess' is the "domain", which basically means the "guess" part of the guess.mo filename. The localedir is the directory location of the locale folder you created. This can be either a relative or absolute path. The 'es' string describes the folder under the locale folder. The LC_MESSAGES folder is a standard name

The install() method will cause all the _() calls to return the Spanish translated string. If you want to go back to the original English, just assign a lambda function value to _ that returns the string it was passed:

import gettext
es = gettext.translation('guess', localedir='locale', languages=['es'])
print(_('Hello! What is your name?'))  # prints Spanish

_ = lambda s: s

print(_('Hello! What is your name?')) # prints English

You can view the translation-ready source code for the "Guess the Number". If you want to run this program, download and unzip this zip file with it's locale folders and .mo file set up.

Further Reading

I am by no means an expert on I18N or gettext, and please leave comments if I'm breaking any best practices in this tutorial. Most of the time your software will not switch languages while it's running, and instead read one of the LANGUAGE, LC_ALL, LC_MESSAGES, and LANG environment variables to figure out the locale of the computer it's running on. I'll update this tutorial as I learn more.

13 thoughts on “Translate Your Python 3 Program with the gettext Module

  1. Gettext is really a usefull tool for Internationalization :D
    To make sure that when a string is translated whith the good characters, I format my string like this:
    print(_(u"My_string"))
    To encode it in utf-8 (to have accent in some languages like french or spanish).
    Thanks for the share =)

  2. Good introduction. That being said ... when you wrote "Note that using string formatting instead of string concatenation will make your program easier to translate" ... you were unfortunately wrong. Your statement assumes that a translation in another language can be done correctly by breaking up a sentence into small phrases that can be translated individually and then put back together in the same order as it is done in English. For example, suppose you wish to translate

    "You have a {color} car."

    Doing it your way, would be something like

    "You have a" + color + "car".

    However, in French, it would be

    "Vous avez une" + "automobile" + color

    So, the proper way is to use a single string. This would then be:

    "You have a {color} car" and "Vous avez une automobile {color}".

      1. If car and color are variables, using %s will be problematic.

        "You have a {color} {car}" and "Vous avez une {car} {color}"

        So using .format(...) should really be the recommended way :-)

  3. Minor typo:

    "This is pretty much what Python's gettext module." ->
    "This is pretty much what Python's gettext module does."

  4. Nice tutorial !
    But is there a way to translate differently two identical strings?
    I saw something like msgctxt in gettext, but I didn't find how to use it with Python3.

  5. I have query regarding gettext usage:

    as per i see the examples everywhere, function call to _() is like this :

    _("some string %s") %var

    i dont understand how is the variable value getting inside the string ? is it some basic python concept which I am missing ?

    1. That's because the _() function will return the translated string, and the translated string will have the %s in it. It's that *translated* string which has the variables inserted into it.

      So if the string was 'Hello %s. How are you?' and I translated it to Spanish, the _() function would return 'Hola %s. Como estas?'. The %s after Hola would have the name inserted into it.

  6. Hey, great tut! I've got questions about the workflow with several files and files in subfolders. Do you first have to create the pot files for each source file individually or can you generate a pot for all files in an application simultaneously?

    And how do you combine them in one file? I just did that manually by copy pasting from one pot into another and it worked, but maybe there's a better way.

Leave a Reply

Your email address will not be published. Required fields are marked *