There's lots written about scaling large complex websites, but what does a casual sysadmin do to fix a simple Apache config? TechMeme picked up my Facebook blog post and suddenly hundreds of readers were coming, peaking at an awesome 4 requests/second. And my poor little blog got very slow, 30 second pageloads. Wutdo?
At first I assumed my server was melting; my blog software is old and simple. But uptime and top showed no CPU problems and free and vmstat showed no swapping. Blosxom's simplicity is a good thing, there's very little to go wrong, particularly with a little caching.
The little light went off in my head: connection queuing. Each web request gets its own Apache process that has to hang around until the client at the other end of the Internet receives all the bytes. Apache has a limited number of workers and one slow browser may be using several at once. I only had 20 workers; it doesn't take long for them all to be sitting around doing nothing except waiting for slow clients to download data. Once that happens new requests have to queue up and wait, 10+ seconds, and things go downhill quickly. You can diagnose queueing via netstat and Apache server-status.