I'm a die-hard emacs-and-command-line hacker. Once a year I try to evolve past my 1991 toolset and use an IDE for coding, hoping to find something as transformative as the Smalltalk 80 environment. The various Java IDEs never worked for me, even VisualAge. Visual Studio is great, but I don't program for Windows much. Most of my coding is Python. So for my latest project I decided to finally try a Python IDE. After an abortive attempt with Eclipse + PyDev I settled on good ol' Wing IDE. Which is great, and I'm ready to pony up $180 for a license.

The key thing about Wing is that it works very well for Python and Python alone. This is not the IDE for a multilanguage product. But if you want a simple path into running a quick Python hack, with room to expand to complex Python projects, it's very good. Note you need the full professional version to get the important features.

The key feature of Wing is good integration to the Python interpreter. There's a Python shell built right in for interactive hacking. And you can interrupt or breakpoint a running program to examine variables on the stack or execute new code in the process context (great for exploring state). The underlying integration is built right into Python and I'm sure emacs, Eclipse, etc can drive it too. But I could never make it work productively for me, whereas with Wing it works right out of the box.

Of course Wing has all the basic IDE stuff: indenting editor, syntax highlighting, code completion, etc. There's code analysis for contextual help, although without static typing it's a a bit awkward. There's also some tacked on unit testing and revision control support, adequate but not great. Honestly the whole IDE suffers a bit from having a Python hacker's idea of good user interface, but the quality of the interpreter integration is good enough to make up for any rough edges in the UI.

I still need a command line. For Windows Vista it's good ol' Cygwin for a Unix-like environment along with PuTTYcyg for a terminal emulator. (Note that stock Python doesn't work well with Cygwin TTYs, but it's usable.) I finally have a way to hack as efficiently as I used to in Unix, but driven mostly from the Windows machine in front of me instead of via remote sessions. It's pretty nice.

  2009-05-24 19:01 Z
One of the most common data analysis things I do in Unix is something like
cat wines | sort | uniq -c | sort -nr
Given an input file with a million bottles of wine in it, this shows me how many bottles of each type I have. It works for other things besides wine. In fact, it works for a lot of things, and I've been doing this for 15 years.

But the first sort is really inefficient, just something you have to do to make uniq work. So for big inputs I use a little Python script, countuniq.py. It does the same thing but more efficiently. Remarkably useful tool.

  2005-09-23 07:09 Z
I love the (?P<foo>) named regexp groups in Python. They make the code so much more readable! But are they slower? Not much.
timeit.py -r 50
  -s 'import re; r = re.compile("foo (?P<x>bar)")'
  'm = r.match("foo bar"); g = m.group("x")'
100000 loops, best of 50: 3.34 usec per loop

timeit.py -r 50
  -s 'import re; r = re.compile("foo (bar)")'
  'm = r.match("foo bar"); g = m.group(1)'
100000 loops, best of 50: 3.14 usec per loop
The named groups version is about 6% slower. Consistent, but not very significant.
  2005-04-09 19:58 Z
I've been doing a lot of MySQL hacking in Python. And like all Python projects I do, I start by stuffing data into anonymous lists and remembering "oh yeah, foo[3] is the name of the wine, and foo[1] is the year". This doesn't scale well, and fortunately MySQLdb has a better way.
import MySQLdb, MySQLdb.cursors
db = MySQLdb.connect(db="wine")
c = db.cursor(cursorclass=MySQLdb.cursors.DictCursor)
  select name, type, year from wine
  where color = %(color)s and year < %(year)s
  """, { "color": "Red", "year": 1972 })

for row in c.fetchall():
  print row['year'], row['name']
The code snippet above is using dictionaries everywhere; both forming the query and handling the response. This lets me name parameters so that if I add a new condition to the where clause or a new field to the select, the rest of my code doesn't break.

I'm taking advantage of two MySQLdb features that go beyond the standard Python DB API. The first is simple; the magic query construction of execute() handles dictionary style substitution just like you'd expect.

The second is more subtle. MySQLdb supports different cursor classes that extend the basic "tuple of tuples" datatype you usually get from fetchall(). I'm using DictCursor, which builds a dictionary from the names in the description field of the cursor. There are also server-side cursors for efficiency with large result sets. It's all implemented via mixins for flexibility.

I'm particularly looking forward to Andy's 2.0 plan to have a "row object that can behave like a sequence or a mapping (dictionary)", giving you the best of both worlds. Combine that with iterators and you could really have something.

PS: if you search for MySQLdb docs, you quickly land at the obsolete module docs. I used these docs for two years! The MySQLdb project has moved to SourceForge and the MySQLdb docs are nicely hosted there.

  2005-01-31 17:08 Z
Many thanks to Uche for his thoughts and code responding to my frustration working with XML in Python. If you're reading this because you want to write good XML code in Python, read his stuff! He knows much better than I. And he gives clear guidance: use his Amara if you want something Pythonic that can deal with XML.

But reading Uche's posts confirms my main point. There are too many XML choices in Python. And the obvious ones aren't right. Apparently PyXML isn't what I'm supposed to be using (despite it being the default when I type import xml on my Debian box), and if you use it the way the docs say to you're wrong. Urgh!

And while I like what Uche says about Amara, is this the easy way to say "parse an XML document"?

from amara import binderytools
rule = binderytools.preserve_attribute_details(u'*')
doc = binderytools.bind_file("foo.opml", rules=[rule])
He explains why all this is necessary for this example (Amara by default doesn't support XPath attributes), but it's just this kind of complexity that frustrates me. Python's strength is that there's a clear, obvious way to do simple things. But not with XML.

See this response from Uche, with lots of good samples and comments.
  2005-01-15 16:47 Z
I hate working with XML. It's easy to extract data from simple text files or CSV files, but XML is all nested, and has entities, and lots of pointy brackets. Regexp just doesn't cut it, you really need an XML parser. And for some reason Python is not so great at XML.

Python has too many XML choices. There's the stock Python install, which barely does anything. Then there's what you probably should use, PyXML, which has an ugly hack to confusingly install on top of the default Python libraries. But if you follow the advice of Python's most visible XML expert, Uche Ogbuji, you may think there's something wrong with PyXML and install 4Suite instead, which is the same as PyXML only different. Or should you use Amara instead? Then there's ElementTree which is brilliantly fast and simple to use, but limited, or xmltramp, which is even more hacky. On the other extreme there's libxml2, which is fast and powerful but has an awful API.

Mind you, this is all for the basic stuff, like parsing XML. There's lots more Python XML options too. But what's missing is a clear single simple library to use. PyXML seems the most standard, but it seems very slow and it tries to be more DOM-like than Python-like. I hate DOM.

All of this is a long-winded preamble to my attempt to do something simple with XPath in Python.

More inside ...

  2005-01-14 15:32 Z
Working with time in Python is confusing. There are three different standard types for representing time: seconds since epoch, tuples, and the datetime module. And there's common add-ons like mxDateTime and database times.

I was having a heck of a time parsing RFC 822 strings like you see in HTTP headers and email. The problem is timezones are not supported by strptime() or the tuple format. But the Web is my programmer:

def parseRFC822Time(t):
    return calendar.timegm(
      time.strptime(t, "%a, %d %b %Y %H:%M:%S %Z"))
The magic here is the calendar module which has the timegm() function missing from the time module.

Thanks to a couple of readers for pointing out there's also a rfc822.parsedate() function.
  2004-09-19 17:03 Z
Python has a fancy CSV module. But near as I can tell, despite all its support for formats and headers and DictReaders it doesn't have a simple way to say "give me my data in a list of dictionaries with headers as keys". Here's the best I could do:
# Grab the headers first
headerReader = csv.reader(fp)
headers = headerReader.next()

# Now construct a second reader on the same 
# file stream to get the actual data
dataReader = csv.DictReader(fp, headers)
for d in dataReader:
  print d
That feels spooky, but it works.
  2004-08-29 23:20 Z
One nice thing about Python is triple-quotes and string substitution make writing templates really simple

page = '''
The time is %(time)s.

print page % { 'title': "Time of day",
               'time': time.asctime() }

The HTML is all by itself with only the simplest Python in the middle of it. And the substitutions are named, not positional, so it's self-documenting. You can substitute the same text more than once. And if you want to be clever, you can use locals() in place of the hand-crafted dictionary to directly substitute Python symbols.

  2004-08-21 16:38 Z
It's way harder than it should be to have a CGI script do something asynchronously in Apache. The root of the problem is that it's not enough to fork a child, you have to close stdin, stdout, and stderr. Only you can't really close them, you have to reassign them.
import sys, os, time

print "Content-Type: text/plain\n\n",
print "Script started"

if os.fork() == 0:
    # Reassign stdin, stdout, stderr for child
    # so Apache will ignore it
    si = file('/dev/null', 'r')
    so = file('/dev/null', 'a+')
    se = file('/dev/null', 'a+', 0)
    os.dup2(si.fileno(), sys.stdin.fileno())
    os.dup2(so.fileno(), sys.stdout.fileno())
    os.dup2(se.fileno(), sys.stderr.fileno())
    # Do whatever you want asynchronously
    os.execv('/bin/sleep', ['sleep', '5'])

print "Process was forked"
This is explained pretty well in Perl and in Python. It's a shame that sys.stdin.close() doesn't work.

I still haven't seen a good explanation for why Apache doesn't send partial output from a CGI: Apache says it doesn't buffer and neither does python -u. Grr. Ah ha, mod_gzip does buffer, unsurprisingly.

Thanks to Marc for research help
  2004-02-21 21:09 Z