Pythonology

Python is an amazing programming language that makes software development productive and fun. Python is open source, was created by a community of thousands of developers world-wide, and is used by about 14% of all programmers today. These are my thoughts as a user, advocate, co-author of an IDE for Python, and a director of the Python Software Foundation.

Tuesday, February 24, 2009

Making code run on Python 2.0 through 3.0

I recently had to make a C extension module and associated Python code work with all Python versions from Python 2.0 through 3.0. Initially, I thought this was going to be very difficult but as it turned out I was able to get it working without too terribly much trouble. It took about a week to get 10K lines each of Python and C wrangled into this supports-all-versions state. Here is what I ran into:

For the extension modules

Most of the changes were due to disappearance of the PyString_* and PyInt_* calls. For the most part I was able to use macros to replace these with calls to PyUnicode_* and PyLong_* equivalents under Python 3.0.

I treated all C strings as UTF-8 encoding, since I had control of this from the outside of the module. This made it possible to use _PyUnicode_AsString as a replacement for PyString_AsString (via a utility call that contains #if PY_VERSION_HEX checks). That saved some time since Python caches and deallocates the utf-8 bytes object returned by this call. Before I found it, I had all sorts of convoluted code trying to deal with deallocation in code that under Python 2.x did not have to worry about this because the return value of PyString_AsString is just borrowed memory. However, using _PyUnicode_AsString does allocate memory where before none was allocated and it depends on the string being successfully converted to utf-8, so it (a) adds some memory overhead, and (b) adds a potential point of failure. I was lucky and was able to treat the failure as something I could just log and continue from. That may not be the case in other code.

Extensions modules are also initialized differently under Python 3.0, although that was easily taken care of with another #if PY_VERSION_HEX check.

For the Python code

Fortunately, this code used a logging implementation so had very few print statements. For those that remained and for other similar code that would now cause a syntax or other fatal error I wrote a module pyutils.py that would selectively import either py2utils.py or py3utils.py based on the value of sys.hexversion. That allowed me to place "bad" code in a place where the disagreeing Python versions would not see it. They contained utilities to replace things like:

  • print, unicode(), callable(), xrange(), etc
  • Use of removed modules like new or heavily pruned modules like types
  • ''.join(x) where x is a string under Python 2.x and bytes under Python 3.x (so b''.join(x) is needed)
In addition to this, since much of this code was originally written against Python 1.5.2, I had to do away with old uses of the string module (string.join and so forth) but since I didn't have to support Python 1.5.2 as well, it was easy to do that with code that would work from Python 2.0 through 3.0.

Remarkably, that was about it! After all I'd heard about how hard it is to make the same code work with Python 2.x and 3.0, this certainly came as a pleasant surprise. In the end, I spent far more time trying to trace down an obscure threading deadlock issue that was accidentally caused by some of my replacement code than I worked on the actual compatibility changes.