Pythonology

Python is an amazing programming language that makes software development productive and fun. Python is open source, was created by a community of thousands of developers world-wide, and is used by about 14% of all programmers today. These are my thoughts as a user, advocate, co-author of an IDE for Python, and a director of the Python Software Foundation.

Tuesday, February 24, 2009

Making code run on Python 2.0 through 3.0

I recently had to make a C extension module and associated Python code work with all Python versions from Python 2.0 through 3.0. Initially, I thought this was going to be very difficult but as it turned out I was able to get it working without too terribly much trouble. It took about a week to get 10K lines each of Python and C wrangled into this supports-all-versions state. Here is what I ran into:

For the extension modules

Most of the changes were due to disappearance of the PyString_* and PyInt_* calls. For the most part I was able to use macros to replace these with calls to PyUnicode_* and PyLong_* equivalents under Python 3.0.

I treated all C strings as UTF-8 encoding, since I had control of this from the outside of the module. This made it possible to use _PyUnicode_AsString as a replacement for PyString_AsString (via a utility call that contains #if PY_VERSION_HEX checks). That saved some time since Python caches and deallocates the utf-8 bytes object returned by this call. Before I found it, I had all sorts of convoluted code trying to deal with deallocation in code that under Python 2.x did not have to worry about this because the return value of PyString_AsString is just borrowed memory. However, using _PyUnicode_AsString does allocate memory where before none was allocated and it depends on the string being successfully converted to utf-8, so it (a) adds some memory overhead, and (b) adds a potential point of failure. I was lucky and was able to treat the failure as something I could just log and continue from. That may not be the case in other code.

Extensions modules are also initialized differently under Python 3.0, although that was easily taken care of with another #if PY_VERSION_HEX check.

For the Python code

Fortunately, this code used a logging implementation so had very few print statements. For those that remained and for other similar code that would now cause a syntax or other fatal error I wrote a module pyutils.py that would selectively import either py2utils.py or py3utils.py based on the value of sys.hexversion. That allowed me to place "bad" code in a place where the disagreeing Python versions would not see it. They contained utilities to replace things like:

  • print, unicode(), callable(), xrange(), etc
  • Use of removed modules like new or heavily pruned modules like types
  • ''.join(x) where x is a string under Python 2.x and bytes under Python 3.x (so b''.join(x) is needed)
In addition to this, since much of this code was originally written against Python 1.5.2, I had to do away with old uses of the string module (string.join and so forth) but since I didn't have to support Python 1.5.2 as well, it was easy to do that with code that would work from Python 2.0 through 3.0.

Remarkably, that was about it! After all I'd heard about how hard it is to make the same code work with Python 2.x and 3.0, this certainly came as a pleasant surprise. In the end, I spent far more time trying to trace down an obscure threading deadlock issue that was accidentally caused by some of my replacement code than I worked on the actual compatibility changes.

7 Comments:

At 1:54 PM, Blogger Brett said...

Just so you know, Stephan, I linked to this post in the Python wiki for porting to Py3K: http://wiki.python.org/moin/PortingToPy3k

 
At 7:21 PM, Blogger Kent Johnson said...

Is this code by any chance publicly available? It sounds like a good example.

 
At 7:31 PM, Blogger Stephan Deibel said...

Unfortunately it's not publicly available. It's the debugger code for Wing IDE, which is a commercial product. If you happen to have a Wing IDE Pro license you can get the source code under a non-disclosure agreement. However, the code's complexity obscures the relative ease of doing the port, and as a result I suspect it may not be such a useful example to try to read through.

 
At 5:52 AM, Blogger ptmcg said...

How did you resolve the problem with the incompatible "except" syntax? Surely your code has exception handling.

 
At 6:04 AM, Blogger Stephan Deibel said...

Instead of the version-specific except clause forms I use just 'except SomeException:' or 'except:' and then use sys.exc_info() inside the except block to get at the exception information. That returns a three-value tuple, the second element of which is the same thing you would get for 'e' in 'except SomeException e:' or 'except SomeException as e|:'.

 
At 6:46 AM, Blogger ptmcg said...

Thanks for the quick reply, Stephan. Unfortunately for me, pyparsing has performance issues enough, without adding more function calls to work around this syntax problem. My rough timing shows that calling exc_info to extract the exception variable incurs a ~30% performance penalty over using "except exc_type, exc_var:". I can certainly use this technique in those parts of code that are not performance-critical, but for those that I need to keep as speedy as possible, I'm still looking for a solution. I wholeheartedly agree that having a single code base is better than dual maintenance of a Py2.x and Py3.x code.

Thanks!

 
At 7:03 AM, Blogger Stephan Deibel said...

If you are using exceptions a lot as a way to handle normal conditions in your program, you may want to rework to not use exceptions for those cases. In my case, an exception really is relatively unusual so the performance hit doesn't matter.

Failing that, you may be able to move this code to the files that are imported selective only in Python 2.x or 3.x. If there are a a lot of instances that won't help much of course.

It does sound like you're stuck restructuring your code one way or the other.

 

Post a Comment

<< Home