I posed a question about embedding a subject URL in a request URL using percent encoding. Thank you for all the helpful replies, here's what I learned.

First, on the existential question, it seems in practice percent encoding really can create distinct names. Try:

In reality this is just a quirk of Apache's handling of %2F, but since it's the default behaviour for the #1 web server out there that's a strong example. As for the theory, this wikipedia article claims that percent encoded reserved characters does create distinct names whereas percent encoded unreserved characters is just aliasing the same name. So escaping / would make a new name but escaping something like 0 wouldn't. Confusing, huh?

As to the practical problem of PATH_INFO being unescaped basically everyone told me "yeah, CGI's a hack like that". So going with the hack I'll just use the REQUEST_URI variable Apache sets. It's not documented anywhere I can find but it seems to be an unadulterated literal copy of what the client requested, from which I can do careful parsing. For my service clients will need to know to percent-escape any / or ? in their URLs. And I'll just hope nothing else in the network decideds it's OK to unescape things on me.

Some other suggested workarounds: length-delimit the subject URL so you know where it ends, have a magic string delimiting the end of the subject URL that you hope doesn't appear in any legitimate subject URL, or put the subject URL at the end of the request URL so that it ends where it ends. Any of these solutions could be made to work, I was just looking for the principle.

Thanks to SethG, RyanB, MikeB, GregW, GordonM, and SamR
tech
  2006-08-23 01:25 Z