Nelson's Weblog: tech / python


Mastodon @nelson@tech.lgbt Linkblog Fri 2025-07-18 Laptop LLM Thu 2025-07-17 Cadillac Publishing Wed 2025-07-16 Amazon kiro breakdown Old Man Yaoi Pope traffic spike Tue 2025-07-15 Grok anime waifu Map of Pride flags Mon 2025-07-14 Church of Robotron 123apps Sun 2025-07-13 China digital IDs Vocoder National Anthem Fri 2025-07-11 Mele ma‘i Puddles Pity Party Show ICE resistance ICE discontent Bypass Paywalls Clean Wed 2025-07-09 Mamdani Headlines DHS Christian Nationalism ICE violence in SF Tue 2025-07-08 LA invasion Search Archives 2024 12 11 10 09 08 07 06 05 04 03 02 01 2023 12 11 10 09 08 07 06 05 04 03 02 01 2022 12 11 10 09 08 07 06 05 04 03 02 01 2021 12 11 10 09 08 07 06 05 04 03 02 01 2020 12 11 10 09 08 07 06 05 04 03 02 01 2019 12 11 10 09 08 07 06 05 04 03 02 01 2018 12 11 10 09 08 07 06 05 04 03 02 01 2017 12 11 10 09 08 07 06 05 04 03 02 01 2016 12 11 10 09 08 07 06 05 04 03 02 01 2015 12 11 10 09 08 07 06 05 04 03 02 01 2014 12 11 10 09 08 07 06 05 04 03 02 01 2013 12 11 10 09 08 07 06 05 04 03 02 01 2012 12 11 10 09 08 07 06 05 04 03 02 01 2011 12 11 10 09 08 07 06 05 04 03 02 01 2010 12 11 10 09 08 07 06 05 04 03 02 01 2009 12 11 10 09 08 07 06 05 04 03 02 01 2008 12 11 10 09 08 07 06 05 04 03 02 01 2007 12 11 10 09 08 07 06 05 04 03 02 01 2006 12 11 10 09 08 07 06 05 04 03 02 01 2005 12 11 10 09 08 07 06 05 04 03 02 01 2004 12 11 10 09 08 07 06 05 04 03 02 01 2003 12 11 10 09 08 07 06 05 04 03 02 01 2002 12 11 10 09 08 07 06 05 04 03 02 01 2001 12 11 10 09 08 07 One good site MDN Nelson Minar nelson@monkey.org Blog licensed under a Creative Commons License		Wing IDE for Python I'm a die-hard emacs-and-command-line hacker. Once a year I try to evolve past my 1991 toolset and use an IDE for coding, hoping to find something as transformative as the Smalltalk 80 environment. The various Java IDEs never worked for me, even VisualAge. Visual Studio is great, but I don't program for Windows much. Most of my coding is Python. So for my latest project I decided to finally try a Python IDE. After an abortive attempt with Eclipse + PyDev I settled on good ol' Wing IDE. Which is great, and I'm ready to pony up $180 for a license. The key thing about Wing is that it works very well for Python and Python alone. This is not the IDE for a multilanguage product. But if you want a simple path into running a quick Python hack, with room to expand to complex Python projects, it's very good. Note you need the full professional version to get the important features. The key feature of Wing is good integration to the Python interpreter. There's a Python shell built right in for interactive hacking. And you can interrupt or breakpoint a running program to examine variables on the stack or execute new code in the process context (great for exploring state). The underlying integration is built right into Python and I'm sure emacs, Eclipse, etc can drive it too. But I could never make it work productively for me, whereas with Wing it works right out of the box. Of course Wing has all the basic IDE stuff: indenting editor, syntax highlighting, code completion, etc. There's code analysis for contextual help, although without static typing it's a a bit awkward. There's also some tacked on unit testing and revision control support, adequate but not great. Honestly the whole IDE suffers a bit from having a Python hacker's idea of good user interface, but the quality of the interpreter integration is good enough to make up for any rough edges in the UI. I still need a command line. For Windows Vista it's good ol' Cygwin for a Unix-like environment along with PuTTYcyg for a terminal emulator. (Note that stock Python doesn't work well with Cygwin TTYs, but it's usable.) I finally have a way to hack as efficiently as I used to in Unix, but driven mostly from the Windows machine in front of me instead of via remote sessions. It's pretty nice. tech • python 2009-05-24 19:01 Z Counting input frequency One of the most common data analysis things I do in Unix is something like cat wines \| sort \| uniq -c \| sort -nr Given an input file with a million bottles of wine in it, this shows me how many bottles of each type I have. It works for other things besides wine. In fact, it works for a lot of things, and I've been doing this for 15 years. But the first sort is really inefficient, just something you have to do to make uniq work. So for big inputs I use a little Python script, countuniq.py. It does the same thing but more efficiently. Remarkably useful tool. tech • python 2005-09-23 07:09 Z Named regexp groups are nearly as fast I love the `(?P<foo>)` named regexp groups in Python. They make the code so much more readable! But are they slower? Not much. timeit.py -r 50 -s 'import re; r = re.compile("foo (?P<x>bar)")' 'm = r.match("foo bar"); g = m.group("x")' 100000 loops, best of 50: 3.34 usec per loop timeit.py -r 50 -s 'import re; r = re.compile("foo (bar)")' 'm = r.match("foo bar"); g = m.group(1)' 100000 loops, best of 50: 3.14 usec per loop The named groups version is about 6% slower. Consistent, but not very significant. tech • python 2005-04-09 19:58 Z Nicer MySQLdb interface I've been doing a lot of MySQL hacking in Python. And like all Python projects I do, I start by stuffing data into anonymous lists and remembering "oh yeah, foo[3] is the name of the wine, and foo[1] is the year". This doesn't scale well, and fortunately MySQLdb has a better way. import MySQLdb, MySQLdb.cursors db = MySQLdb.connect(db="wine") c = db.cursor(cursorclass=MySQLdb.cursors.DictCursor) c.execute(""" select name, type, year from wine where color = %(color)s and year < %(year)s """, { "color": "Red", "year": 1972 }) for row in c.fetchall(): print row['year'], row['name'] The code snippet above is using dictionaries everywhere; both forming the query and handling the response. This lets me name parameters so that if I add a new condition to the where clause or a new field to the select, the rest of my code doesn't break. I'm taking advantage of two MySQLdb features that go beyond the standard Python DB API. The first is simple; the magic query construction of `execute()` handles dictionary style substitution just like you'd expect. The second is more subtle. MySQLdb supports different cursor classes that extend the basic "tuple of tuples" datatype you usually get from `fetchall()`. I'm using DictCursor, which builds a dictionary from the names in the description field of the cursor. There are also server-side cursors for efficiency with large result sets. It's all implemented via mixins for flexibility. I'm particularly looking forward to Andy's 2.0 plan to have a "row object that can behave like a sequence or a mapping (dictionary)", giving you the best of both worlds. Combine that with iterators and you could really have something. PS: if you search for MySQLdb docs, you quickly land at the obsolete module docs. I used these docs for two years! The MySQLdb project has moved to SourceForge and the MySQLdb docs are nicely hosted there. tech • python 2005-01-31 17:08 Z XPath, XML, Python take 2 Many thanks to Uche for his thoughts and code responding to my frustration working with XML in Python. If you're reading this because you want to write good XML code in Python, read his stuff! He knows much better than I. And he gives clear guidance: use his Amara if you want something Pythonic that can deal with XML. But reading Uche's posts confirms my main point. There are too many XML choices in Python. And the obvious ones aren't right. Apparently PyXML isn't what I'm supposed to be using (despite it being the default when I type `import xml` on my Debian box), and if you use it the way the docs say to you're wrong. Urgh! And while I like what Uche says about Amara, is this the easy way to say "parse an XML document"? from amara import binderytools rule = binderytools.preserve_attribute_details(u'*') doc = binderytools.bind_file("foo.opml", rules=[rule]) He explains why all this is necessary for this example (Amara by default doesn't support XPath attributes), but it's just this kind of complexity that frustrates me. Python's strength is that there's a clear, obvious way to do simple things. But not with XML. See this response from Uche, with lots of good samples and comments. tech • python 2005-01-15 16:47 Z XPath, XML, Python I hate working with XML. It's easy to extract data from simple text files or CSV files, but XML is all nested, and has entities, and lots of pointy brackets. Regexp just doesn't cut it, you really need an XML parser. And for some reason Python is not so great at XML. Python has too many XML choices. There's the stock Python install, which barely does anything. Then there's what you probably should use, PyXML, which has an ugly hack to confusingly install on top of the default Python libraries. But if you follow the advice of Python's most visible XML expert, Uche Ogbuji, you may think there's something wrong with PyXML and install 4Suite instead, which is the same as PyXML only different. Or should you use Amara instead? Then there's ElementTree which is brilliantly fast and simple to use, but limited, or xmltramp, which is even more hacky. On the other extreme there's libxml2, which is fast and powerful but has an awful API. Mind you, this is all for the basic stuff, like parsing XML. There's lots more Python XML options too. But what's missing is a clear single simple library to use. PyXML seems the most standard, but it seems very slow and it tries to be more DOM-like than Python-like. I hate DOM. All of this is a long-winded preamble to my attempt to do something simple with XPath in Python. More inside ... tech • python 2005-01-14 15:32 Z Parsing RFC 822 time Working with time in Python is confusing. There are three different standard types for representing time: seconds since epoch, tuples, and the datetime module. And there's common add-ons like mxDateTime and database times. I was having a heck of a time parsing RFC 822 strings like you see in HTTP headers and email. The problem is timezones are not supported by `strptime()` or the tuple format. But the Web is my programmer: def parseRFC822Time(t): return calendar.timegm( time.strptime(t, "%a, %d %b %Y %H:%M:%S %Z")) The magic here is the calendar module which has the `timegm()` function missing from the time module. Thanks to a couple of readers for pointing out there's also a `rfc822.parsedate()` function. tech • python 2004-09-19 17:03 Z Reading CSV files with headers Python has a fancy CSV module. But near as I can tell, despite all its support for formats and headers and DictReaders it doesn't have a simple way to say "give me my data in a list of dictionaries with headers as keys". Here's the best I could do: # Grab the headers first headerReader = csv.reader(fp) headers = headerReader.next() # Now construct a second reader on the same # file stream to get the actual data dataReader = csv.DictReader(fp, headers) for d in dataReader: print d That feels spooky, but it works. tech • python 2004-08-29 23:20 Z Python and templates One nice thing about Python is triple-quotes and string substitution make writing templates really simple page = ''' <html><head> <title>%(title)s</title> </head><body> <h1>%(title)s</h1> The time is %(time)s. </body></html> ''' print page % { 'title': "Time of day", 'time': time.asctime() } The HTML is all by itself with only the simplest Python in the middle of it. And the substitutions are named, not positional, so it's self-documenting. You can substitute the same text more than once. And if you want to be clever, you can use `locals()` in place of the hand-crafted dictionary to directly substitute Python symbols. tech • python 2004-08-21 16:38 Z Python indentation is fine People often complain they don't want to learn Python because it uses whitespace instead of braces to express program structure. I used to whine about this too, used it as a lazy excuse not to learn the language. Then I got over it and tried Python and found I loved it. Even the whitespace thing. Less line noise makes code easier to read. And relying on whitespace eliminates a whole class of C and Java bugs where the code doesn't do what it looks like it does because the indentation doesn't match the brace structure. So if the whitespace is scaring you off Python, put that issue aside for a week and try it out. If you use emacs, python-mode helps and is part of the default install. tech • python 2004-08-21 15:37 Z Nelson's Weblog • tech • python