About once every six months I do some hack which involves caching web pages on disk. While hacking I always just write stuff to a cache directory so I can load it / replay it quickly. And inevitably I forget about the crappy cache and when I run the job for real I only remember three days later when the directory has an unwieldy 200,000 files in it.

Operating systems fail in all sorts of charming ways when you have a directory with "a lot" of files, typically over 10,000. Both the Vista and Linux kernels no longer seem to have O(n) operations on directories, so deleting all the files is no longer O(n^2). But the tools still freak out. For example, rm * doesn't work if the expands to larger than the command line limit.

Vista has a host of joys associated with giant directories. Opening the directory in the file shell actually works. Selecting all files and deleting doesn't, though, and the entire UI becomes unresponsive on a directory with even 20,000 or so files. del * from a command line does seem to work, but is awfully slow. I finally wrote some custom Python to unlink the files quickly only to find they were in a search indexed directory; the entire deletion process would freeze for 20 seconds at a time while the indexer chewed over the removals. Ugh.

By the way, if you ever need to remove a bunch of files the lower level you do it, the better. Even rm does more examination of the file than you want. Here's a quick python hack that seems pretty efficient.

files = os.listdir('.')
for f in files:
  try: os.unlink(f)
  except Exception, e: print e
techbad
  2009-05-10 21:40 Z