I've switched this blog to explicitly serve everything in UTF-8 encoding. Before I didn't think much about it and confined myself to ASCII and hoped for the best. Now I can just type "François Rabelais or Björk Guðmundsdóttir or 艾未未" directly and not HTML entity escape it. If you notice any errors, please let me know.
I was surprised to learn there are at least four ways for a web page to declare their encoding. (Or charset, the terms are ambiguous.) Annoyingly the web server's HTTP header declaration overrides whatever encoding the document itself declares via a <meta> tag. I think that was back in the fantasy world where servers would negotiate content types and transcode on the fly. These days unless you're writing in Chinese or are a super-duper expert you should always be using UTF-8.
Unrelated, thanks to Aristotle for finding a bug in the updated tags on my Atom feed. I'd hacked some Perl code and forgot how Perl scoping works.
I've migrated this weblog to a new host. You shouldn't notice any changes, if you're reading this post you're already reading it on the new server. But if you notice something wrong (or if it works and you feel chatty) please email me at email@example.com.
I'm a bit disappointed I haven't yet switched weblog software. I'm still using Blosxom, an idiosyncratic if capable Perl script I've been running for 8 years now. The only real drawback is lack of modern blog editing tools. The main reason I haven't switched is none of the hosted services have a good way for keeping my weird old URLs working. Also I've recently needed a real server in a datacenter for some projects so it's easier and cheaper to just move to a new host.
The new server is at 22.214.171.124, hosted by Wholesale Internet in Kansas City. They seem like a good small server option: $70/month for a decent dedicated, self-managed server. I also took the chance to move from Debian to Ubuntu Server, some of my reactions to that change on my secret work blog. I sure wish I had a better way to manage a server than logging into it and modifying random config files all over /etc. Puppet and Chef are way too much work for a single casual server and Blueprint, while cool, is a bit too simple minded.
Hat tip to Adam Fast for recommending Wholesale Internet
Apparently it's news to almost every web developer out there, but in the real world people's names have spaces in them. My name is "Nelson Minar". It is not "Nelson_Minar" or "NelsonMinar" or "NelsonM" or "Nelson397" or any of the other nonsense I have to use to work with some website who's decided to constrain names to some 1980s software-friendly character subset.
The hardest part of signing up for a new site these days is picking a unique user name. It's annoying to have to remember different names. And it's really obnoxious when my janked up UserName is also used as my display name. The right way to do logins right now on the Web is use email address as the login name and let the user choose their own display name which does not need to be unique. That's not ideal (email addresses can change) but it works pretty well. If you absolutely have to not use email as the login name, please at least let my login name have a space in it.
While I'm delivering the news, here's something for you ignorant American backwoods motherfuckers. Some people's names have "special characters" in them. Like François Rabelais or Björk Guðmundsdóttir or 艾未未. It's 2011; the only software that can't handle Unicode properly is Perl. (As if you needed another reason not to use Perl.) Stop limiting your code; there are only two languages that can even be written in ASCII.
One thing people who hate SOAP say is that the XML for SOAP is ugly. That used to be a problem because of rpc/encoded style. But thanks mostly to WS-I the SOAP community has moved on to the simpler document/literal.
The nice thing about doc/lit is that it's really just any ol' XML message with two SOAP tags wrapped around it. SOAP says very little about what's inside your message, just that it should have a namespace and it should be describable via XML Schema. Here's an example:
<?xml version='1.0' encoding='UTF-8'?>The stuff in black is the app's data. The rest is what you need to turn some random XML into a SOAP message.
Even those two SOAP tags might seem like too much, but they give you a couple things. The headers give you a transport-neutral way to add header metadata to a message, and SOAP Faults (not shown) give you a structured way to indicate detailed errors.
If you're comfortable parsing XML, you're comfortable parsing doc/lit SOAP. But SOAP also offers the possibility of automatic data bindings (no parsing required) and WSDL (service description). Alas, those technologies still don't work so well in Perl, Python, or PHP where doc/lit support is weak. It does work pretty well in Java and .NET.
It's way harder than it should be to have a CGI script do something asynchronously in Apache. The root of the problem is that it's not enough to fork a child, you have to close stdin, stdout, and stderr. Only you can't really close them, you have to reassign them.
import sys, os, time print "Content-Type: text/plain\n\n", print "Script started" if os.fork() == 0: # Reassign stdin, stdout, stderr for child # so Apache will ignore it si = file('/dev/null', 'r') so = file('/dev/null', 'a+') se = file('/dev/null', 'a+', 0) os.dup2(si.fileno(), sys.stdin.fileno()) os.dup2(so.fileno(), sys.stdout.fileno()) os.dup2(se.fileno(), sys.stderr.fileno()) # Do whatever you want asynchronously time.sleep(2) os.execv('/bin/sleep', ['sleep', '5']) print "Process was forked"This is explained pretty well in Perl and in Python. It's a shame that sys.stdin.close() doesn't work.
I still haven't seen a good explanation for why Apache doesn't send partial output from a CGI: Apache says it doesn't buffer and neither does python -u. Grr. Ah ha, mod_gzip does buffer, unsurprisingly.
Thanks to Marc for research help
Since learning Python ten months ago I've been a much happier hacker. I'll never go back to Perl again, and I'm increasingly frustrated working with Java. Here's some of what I've written in ten months:
The Sony Ericsson T616 has a Bluetooth adapter that acts like a serial port. If you put the phone in serial mode, it understands a wide range of AT commands. Some useful references: R320s_WP_R1A.pdf, 888_r1d.pdf, AT Test commands, Google search for [CPBR ericsson].
I hope to use this to modify my contact list
at+cpbr=1I bet floAt's Mobile Agent uses this protocol. Other folks have hacked their phones to be remote controls (Python, Perl). Phonefront is a commercial control app.
I read somewhere that you can read English even if the letters in each word are mixed up as long as the first and last letters are in the correct position. Try it! Read my scrambled blog. Sample:
I raed swreemohe that you can raed Eignlsh eevn if the ltetres in each word are mexid up as long as the frsit and last lertets are in the crrcoet pooisitn. Try it! Raed my smrblaced blog.My blog software, Blosxom, is so hackable that it is straightforward to add a plugin to do this. Plugins even chain nicely, like this scrambled search for Perl.
The code is quite a hack, using Perl Inline to let me write the actual text processing in Python like Rael's demo. There is one neat trick: Inline->bind() lets me defer the import of Python until it's actually needed, meaning there's no efficiency cost if the Python code isn't invoked.
Update: thanks to Misha for fixing my mistake and pointing me to jwz's blog entry.
Finding the length of an array must be an unusual thing for Perl programs to do, because Perl doesn't have an operator for it. It does have the evil $#:
You may find the length of array @days by evaluating $#days, as in csh. However, this isn't the length of the array; it's the subscript of the last element, which is a different value since there is ordinarily a 0th element.Huh? What does this mean? And is Perl really modelled after csh? Let's try to do something simple, see how many arguments were passed to our program:
#!/usr/bin/perl -wNow let's run it...
print "\$\#ARGV is $#ARGV\n";
/tmp/argv.plHuh? -1? This must have been confusing to others too, because it's documented again in the docs for @ARGV
$#ARGV is -1
/tmp/argv.pl two arguments
$#ARGV is 1
$#ARGV is generally the number of arguments minus one, because $ARGV is the first argument, not the program's command name itself.I realize the simple rule is 'the length of @array is $#array+1', but how dumb is that?
Update: a friend pointed out you can get length by evaluating @array in scalar context. Contexts are one of those horrible features in Perl that make me have to relearn the language every time I write a program. There's more than one way to do it but none of them are simple.
Mark did an interesting experiment with a WSDL for a GET-based web service, reprising a similar thing Paul did last year for the Google Web APIs.
Unfortunately, GET-based WSDL doesn't seem to work well. I can't make Mark's WSDL work with either Java's Apache Axis 1.1 or Perl's SOAP::Lite 0.55. Neither seem to find any methods to invoke. There's also an issue of getting .NET to do authentication; that may be solvable.
I'm seeing the same problem with Paul's Google wrapper. WSDL is one of those squishy specs where stuff may be 'correct' but it doesn't work with any of the tools. I suspect the issue here is since no one is using the GET binding, it just doesn't work in many places. Frustrating situation.