It's way harder than it should be to have a CGI script do something asynchronously in Apache. The root of the problem is that it's not enough to fork a child, you have to close stdin, stdout, and stderr. Only you can't really close them, you have to reassign them.
import sys, os, time print "Content-Type: text/plain\n\n", print "Script started" if os.fork() == 0: # Reassign stdin, stdout, stderr for child # so Apache will ignore it si = file('/dev/null', 'r') so = file('/dev/null', 'a+') se = file('/dev/null', 'a+', 0) os.dup2(si.fileno(), sys.stdin.fileno()) os.dup2(so.fileno(), sys.stdout.fileno()) os.dup2(se.fileno(), sys.stderr.fileno()) # Do whatever you want asynchronously time.sleep(2) os.execv('/bin/sleep', ['sleep', '5']) print "Process was forked"This is explained pretty well in Perl and in Python. It's a shame that sys.stdin.close() doesn't work.
I still haven't seen a good explanation for why Apache doesn't send partial output from a CGI: Apache says it doesn't buffer and neither does python -u. Grr. Ah ha, mod_gzip does buffer, unsurprisingly.
Thanks to Marc for research help
I've been using mod_gzip on my weblog server to try to save bandwidth. Today I crunched some numbers and learned that gzip encoding only works for about one third of the web requests for HTML that I get. When it does work, it compresses to about 30% of the original size.
Turns out that while most user browsers support gzip encoding, most spiders don't. GoogleGuy says this may be because servers don't reliably serve gzip. I could believe that given the contortions I had to go through.
RSS aggregators are mostly good about supporting gzip. They are good about handling 304 Not Modified, too. Good thing; RSS polling is such a huge source of traffic.
I serve my blog via my dinky 128kbps upstream DSL link, so bandwidth is precious. Fixing the fiasco of mod_gzip triggering an MSIE bug helps a lot. Now I'm supporting If-Modified-Since and ETags headers on my blog contents, too. The magic is Bob Schumaker's lastmodified plugin, which pretty much Just Works. Thanks, Bob!
Please tell me if you see any caching weirdness.
Today I learned that Internet Explorer isn't caching any images from my blog at all. Why? A nasty bug in MSIE that mod_gzip triggers. Gory details and a partial fix below.
The issue is that mod_gzip includes the following header in all responses:
Vary: Accept-EncodingThis helps prevent caches from serving gzip data to browsers that can't support it.
Unfortunately it also triggers a bug in MSIE - the browser won't cache any document with that header! So with mod_gzip 95% of the world's browsers won't cache any pages from the server. Some bandwidth savings.
It'd be nice if mod_gzip was smart enough not to add the Vary: header if it didn't compress the file, but it's not. A partial workaround is to turn mod_gzip off for files it won't be compressing anyway, like images.
<FilesMatch "\.(gif|jpe?g|png)$">This fix is only partial; other files (say, HTML) still won't be cached. Three choices - stop using gzip, lose caching in IE, or drop the Vary: header and break caches.
Michael was kind enough to write me and comment on my mod_gzip notes. He suggests not specifying
mod_gzip_item_exclude reqheader "User-agent: Mozilla/4.0"because it results in a
Vary: User-Agentheader which makes life hard on proxy servers and only protects the miniscule few people who run old Netscape 4.0 versions. Isn't technology fun? He also says that Apache 2.0's mod_deflate does indeed make HTTP compression easier; Apache 2 was designed for plugins to filter traffic as it is served.
In the spirit of saving bits I set up mod_gzip on my Apache 1.3 server. Now HTTP stuff is compressed in transit. Fetching my weblog went from 20384 bytes to 9717 bytes; even better, it went from 38 packets to 21 packets. This may seem silly but on an ADSL line upstream bandwidth is hugely limited; anything I can do to save bandwidth is welcome.
Usability on mod_gzip is fairly low. Original site is offline, docs are awful. Fortunately someone has taken on the task of making a decent support site. Even then the details of how it works are magic and opaque; honestly, this kind of server configuration should be much easier or automatic. Maybe it is in Apache 2.0.
Here's the magic I'm using:
LoadModule gzip_module /usr/lib/apache/1.3/mod_gzip.so
mod_gzip_item_exclude reqheader "User-agent: Mozilla/4.0"
mod_gzip_item_include handler ^cgi-script$
mod_gzip_item_include mime text/