I've been a faithful spamassassin user for a long time. Never thought much about how it worked, just been happy. But a lot of spam is leaking through the last month, so I went looking for a tuneup.

Vipul's Razor
Collaborative spam database. On Debian just do apt-get install razor and you're razoring. More work required if you want to report spam.
Another collaborative spam database. apt-get install pyzor.
Bayesian spam filtering implementation for spamassassin. Requires 1000+ training messages. sa-learn --ham --mbox archive.mbox
I feel nervous mixing all these methods, I just hope SpamAsassin sorts them out. This spam detection rate discussion is vaguely interesting.

I tried my new spamassassin setup on 594 emails my old spamassassin setup said were not spam. The new setup correctly identified 342 as spam and 247 as non-spam. It identified 5 messages as non-spam when in fact they were spam, and a reassuring 0 messages as spam which were not spam. This is all excellent!

Of the 342 newly-found spam, 303 were caught by the Bayesian filter, 172 by Pyzor, and 160+ by Razor.

Update 2004-01-16: I made a boneheaded mistake in this evaluation. I trained the Bayesian filter on some data, then tested it on the same data. D'oh. The reality is the Bayesian filter is still much better than without, but not quite as stellar.

  2003-12-31 00:38 Z
After surveying BitTorrent download traffic for Matrix: Revolutions last month I thought I'd check out download traffic for a movie that didn't suck.
Return of the King traffic is about the same as Matrix Revolutions: lots of traffic early, then the tracker crashes. People were successful though: several thousand probably got the whole file before the tracker died, patient ones could finish when the tracker was restarted, and there were at least 10 other torrents out there.

10,000 people grabbing a 2 gig file is a lot of video on demand. Still, crappy quality. The one sample I saw had decent sound and the picture didn't shake but the colour was all washed out. The cinematography in ROTK is so beautiful $9 is a bargain.

  2003-12-30 23:32 Z
The XBox hacking scene is impressive. The XBox is a $150 multimedia PC with DVD, hard drive, TV output (including HDTV), 100Mb ethernet, a really nice game controller, and enough CPU and RAM to run interesting programs.

Folks have found various ways to install custom software on it. There are two basic routes: a full Linux install or adding homebrew programs to the native OS (a stripped down Win2k).

Of the homebrew programs XBox media center has gotten the most attention. It's a multimedia suite that plays movies, music, photos, etc from the local hard drive, from DVD, or streamed from a server.

XPort is also impressive. One individual has ported about 20 different emulators from Windows to the XBox. Gameboy, Nintendo, Playstation, Atari 2600, Intellivision, Apple ][, etc etc, all running neatly. Quite an achievment.

Many folks have speculated that Microsoft's plan with XBox is to slowly move Windows into the living room. The existing XBox hardware is already sufficient to do this, it's just the software isn't readily available yet. The hacker scene is about a year ahead.

  2003-12-27 23:32 Z
Sick of waiting for 20+ minutes when you call AT&T Wireless customer care? Call the secret number 1-888-799-1305 and enjoy no hold times. Just tell the robot you're a 3G site, English, and you're on the same high priority queue the AT&T stores enjoy. This probably won't work for very long. No longer works (2004-01-18).

I'm still trying to fix my account after November's customer service meltdown. For the last six weeks customer care has "been experiencing heavy call volume" with "wait times longer than 20 minutes". The worst thing is the damned ads they play at you while you're on hold.

As seen on Howard Forums
  2003-12-26 16:57 Z
I bought the hype that 802.11g is 54 Megabits/second. I paid extra for expensive 802.11g gear. I had this stupid idea that 802.11g's 54Mbps was close enough to 100Mbps ethernet that I didn't need to run wires in my house. It says so right on the box of my Linksys WET54g and in the product sheet: "Wireless-G (54 Mbps)".

This is false advertising. The fastest 802.11g will go is 20Mbps, not 54Mbps. And in a mixed 802.11b/g network the fastest two 802.11g devices can go is 14Mbps. I verified this with a WET54g sitting right next to a WAP54g. Throughput on an FTP? 12.8Mbps. And this is best-case, quiet network with devices right next to each other. In a real deployment I get 30% packet loss.

Between speed, security concerns, and general flakiness wireless is really not a reasonable option for regularly copying 2 gig files around. Good thing Amazon has a generous return policy. At least I get my $150 back.

  2003-12-25 00:29 Z
One of the Bizarro-world realities of today is that the same White House folks who conduct the war on Iraq were sucking up to Hussein as an ally 20 years ago. Two items on this. First, a lovely Mike Luckovich cartoon
Second, a story in today's NYT: Rumsfeld Made Iraq Overture in '84 Despite Chemical Raids.
As a special envoy for the Reagan administration in 1984, Donald H. Rumsfeld, now the defense secretary, traveled to Iraq to persuade officials there that the United States was eager to improve ties with President Saddam Hussein despite his use of chemical weapons ...

"The Iraqi leadership was extremely pleased with Amb. Rumsfeld's visit," the memo said. "Tariq Aziz had gone out of his way to praise Rumsfeld as a person."

Dec 20 was the 20th anniversary of Rumsfeld and Hussein's handshake. More: National Security Archive.

  2003-12-23 16:37 Z
Clay Shirky's The RIAA Succeeds Where the Cypherpunks Failed is worth reading.
The music industry's attempts to force digital data to behave like physical objects has had two profound effects, neither of them about music. The first is the progressive development of decentralized network models, loosely bundled together under the rubric of peer-to-peer. ... And the second effect, of course, is the long-predicted and oft-delayed spread of encryption.
The cypherpunks movement is a very powerful set of ideas. But they all slammed into the wall of consumer indifference. I think Clay overstates the case a bit, but I agree with him that the RIAA is driving crypto.

The other place that the RIAA is setting the cypherpunk vision in motion is their own DRM technologies. Watermarks, locked media, Palladium: it's like a cypherpunks wet dream. Only it's a nightmare: the cryptokeys are in the hands of just a few people.

  2003-12-21 18:20 Z
Jon Carroll has an insightful and amusing column on the idea that Howard Dean is a wild-eyed liberal.
So what Howard Dean said is not radical or remarkable or innovative. I mean, he's an interesting guy, and I might even find myself voting for him, but he's not Roosevelt or anything. He just hasn't signed on to the Official Bush/Cheney/Wolfowitz worldview, which makes him a suspicious character indeed.
  2003-12-19 17:39 Z
The Bush administration's readiness to detain hundreds of people with no charges, no access to lawyers, and no due process should frighten you. Finally there's some good news: two courts have ruled that the Bush folks can't just lock people up forever with no due process.

The detentions of hundreds of people at Guantánamo is pretty bad. The Bush Administration has been arguing that since the detainees are in Cuba, they have no rights under US or, presumably, any law. Nice! Fortunately the Ninth Circuit said that was ridiculous since the US runs the camp in Guantánamo.

While Guantánamo is bad, the case of Jose Padilla is horrible. Here we have a US citizen, arrested in America, and the government has been claiming he has no rights. Secret detention by the US government: no lawyer, no charges, nothing. It's absolutely outrageous, and finally a court said so. Even if the man is guilty of all he's charged with, that's no excuse.

Americans have died defending our freedoms for over two hundred years. Bush's Justice Department seems happy to trample all over that, the courts are finally responding.

  2003-12-19 17:34 Z
Fox's new 'reality' show, The Simple Life, is unwatchably offensive. I had some hope: fish out of water is a good formula, I find Paris Hilton strangely compelling, and it's always fun to laugh at rich airheads. But the show is just mean. It's not making fun of the rich girls who volunteered to be in the show; the butt of the joke is hard working people in rural Arkansas.

Watch Paris and Nicole show up an hour late to milk the cows and spill the milk everywhere. That's OK, they'll just 'work' at the fast food place tomorrow! Watch the well-meaning Sonic manager try to train the rich girls for a wage slave job. Watch the rich girls ridicule the job behind the manager's back. That's OK, they'll just 'work' somewhere else tomorrow, and after the show is over they'll go back to being rich.

After 'the girls' are back in their vapid lives the farmer is still going to be working his ass off trying to make a living with dairy cows and the fast food manager is still going to be working as hard as she can at $7 an hour to make ends meet. And that's going to be the rest of their lives. Rather than sympathizing with the hard and honourable realities of being lower middle class in rural America, the show turns it all into a cheap disrespectful joke.

  2003-12-17 16:35 Z
The media is stoking the fear that this year's flu season is going to be worse than ever. Remember 1918?
Lovely bit of alarmist infographics. Does the line continue to go up? Is it reporting bias or a real trend?

My friend Marc points out that in this infographic the flu is Republican.

  2003-12-12 17:32 Z
In today's New York Times is an all-too-predictable article about Halliburton's getting rich off the war in Iraq at the expense of US taxpayers. Here's the cost of a gallon of gas imported from Kuwait by three different organizations:

$2.64Halliburton imported gas
$1.19Pentagon imported gas
$.96Iraqi imported gas

Of Halliburton's $2.64/gal, $1.17 is the price they pay in Kuwait, $1.21 is the cost of Halliburton transport, and $.26 is Halliburton's explicit markup. This is just a tiny example of the cost of oil and defense companies owning the White House.

The Houston Chronicle has Halliburton's story.

  2003-12-10 17:14 Z
I'm with Steve: HTTP already has plenty of ways to handle caching, don't invent something new for RSS/Atom aggregators. If they just follow Mark's rules (handy tests and instructions), life will be fine.

I worked hard to help HTTP caching on my blog. It's complicated, particularly with the pastiche of dynamic content I have. Used to be 40% of my weblog requests were answered with a bandwidth-saving 304. When I added my linkblog it went down to 25%, probably both because the HTML view changes more often and because I removed ETags support.

Most aggregators do fine. Radio Userland is having trouble since I turned off ETags. And NetNewsWire has a surprisingly low number of 304s, although a quick inspection doesn't show anything obvious.

  2003-12-07 20:52 Z
The 'Territories' of the Homeless and A Sense of Place are two more articles in the SF Chron's series on homelessness in SF. Along with the handy map, it's a guide to the various neighbourhoods of the homeless.
Recognizing distinctions such as the "territories" helps show the human face of a population that is now — to most San Franciscans — both extremely familiar and painfully foreign at the same time.
From Car Nation to the Heroin Zone to the Service / Crack Zone, it's all there. Fodor's 2004 should include this info.
  2003-12-04 04:23 Z
Thanks to Impacket I now have a bit of fascinating news: most of my blog readers have an MTU of 1500 bytes. The Maximum Transfer Unit is the size of a TCP packet. You want this to be as big as possible. 1500 is generally the limit on the Internet (it's the Ethernet limit), but smaller sizes may be better depending on your net connection.

More inside ...

  2003-12-04 03:18 Z
Pcapy and Impacket are good software. They're Python libraries to make it easy to sniff packets and parse them, as well as create packets. Think of it like an ethereal you can easily program.

# Print out sizes of IP packets
import pcapy, impacket, impacket.ImpactDecoder

decoder = impacket.ImpactDecoder.EthDecoder()
# packets = pcapy.open_live("eth0", 1500, 0, 100)
packets = pcapy.open_offline('/tmp/cap/capture')
for i in xrange(100):
   (header, data) = packets.next()
   eth = decoder.decode(data)
   ip = eth.child()
   print ip.get_ip_len()

It's brand new. The docs are nearly nonexistent and the library isn't as Pythonic as one would hope. But it works pretty well! Compare also scapy (less libpcap-like).

PS: I ran into a problem installing on Debian

ImportError: /usr/lib/python2.3/site-packages/pcapy.so: undefined symbol: __gxx_personality_v0
The workaround was to link the .so with g++ instead of gcc. This is either a bug in gcc or Python distutils.
  2003-12-02 17:06 Z