One of the most common data analysis things I do in Unix is something like
cat wines | sort | uniq -c | sort -nr
Given an input file with a million bottles of wine in it, this shows me how many bottles of each type I have. It works for other things besides wine. In fact, it works for a lot of things, and I've been doing this for 15 years.

But the first sort is really inefficient, just something you have to do to make uniq work. So for big inputs I use a little Python script, countuniq.py. It does the same thing but more efficiently. Remarkably useful tool.

techpython
  2005-09-23 07:09 Z