Nelson's Weblog: tech / urlsAboutUrls2

URLs about URLs, some answers

I posed a question about embedding a subject URL in a request URL using percent encoding. Thank you for all the helpful replies, here's what I learned.

First, on the existential question, it seems in practice percent encoding really can create distinct names. Try:

http://somebits.com/weblog/ http://somebits.com%2Fweblog/

In reality this is just a quirk of Apache's handling of %2F, but since it's the default behaviour for the #1 web server out there that's a strong example. As for the theory, this wikipedia article claims that percent encoded reserved characters does create distinct names whereas percent encoded unreserved characters is just aliasing the same name. So escaping / would make a new name but escaping something like 0 wouldn't. Confusing, huh?

As to the practical problem of PATH_INFO being unescaped basically everyone told me "yeah, CGI's a hack like that". So going with the hack I'll just use the REQUEST_URI variable Apache sets. It's not documented anywhere I can find but it seems to be an unadulterated literal copy of what the client requested, from which I can do careful parsing. For my service clients will need to know to percent-escape any / or ? in their URLs. And I'll just hope nothing else in the network decideds it's OK to unescape things on me.

Some other suggested workarounds: length-delimit the subject URL so you know where it ends, have a magic string delimiting the end of the subject URL that you hope doesn't appear in any legitimate subject URL, or put the subject URL at the end of the request URL so that it ends where it ends. Any of these solutions could be made to work, I was just looking for the principle.

Thanks to SethG, RyanB, MikeB, GregW, GordonM, and SamR

tech
2006-08-23 01:25 Z


Mastodon @nelson@tech.lgbt Linkblog Fri 2025-07-11 Mele ma‘i Puddles Pity Party Show ICE resistance ICE discontent Bypass Paywalls Clean Wed 2025-07-09 Mamdani Headlines DHS Christian Nationalism ICE violence in SF Tue 2025-07-08 LA invasion Guadalupe escape Mon 2025-07-07 nVidia GPU virtualization Bot Traffic Sun 2025-07-06 OCaml at Jane Street Fri 2025-07-04 Anila Quayyum Agha Black ASL Wed 2025-07-02 AI gender Mon 2025-06-30 China open source Hail Mary trailer Sat 2025-06-28 Budapest Pride Fri 2025-06-27 Hetero Awesome... hijacked Search Archives 2024 12 11 10 09 08 07 06 05 04 03 02 01 2023 12 11 10 09 08 07 06 05 04 03 02 01 2022 12 11 10 09 08 07 06 05 04 03 02 01 2021 12 11 10 09 08 07 06 05 04 03 02 01 2020 12 11 10 09 08 07 06 05 04 03 02 01 2019 12 11 10 09 08 07 06 05 04 03 02 01 2018 12 11 10 09 08 07 06 05 04 03 02 01 2017 12 11 10 09 08 07 06 05 04 03 02 01 2016 12 11 10 09 08 07 06 05 04 03 02 01 2015 12 11 10 09 08 07 06 05 04 03 02 01 2014 12 11 10 09 08 07 06 05 04 03 02 01 2013 12 11 10 09 08 07 06 05 04 03 02 01 2012 12 11 10 09 08 07 06 05 04 03 02 01 2011 12 11 10 09 08 07 06 05 04 03 02 01 2010 12 11 10 09 08 07 06 05 04 03 02 01 2009 12 11 10 09 08 07 06 05 04 03 02 01 2008 12 11 10 09 08 07 06 05 04 03 02 01 2007 12 11 10 09 08 07 06 05 04 03 02 01 2006 12 11 10 09 08 07 06 05 04 03 02 01 2005 12 11 10 09 08 07 06 05 04 03 02 01 2004 12 11 10 09 08 07 06 05 04 03 02 01 2003 12 11 10 09 08 07 06 05 04 03 02 01 2002 12 11 10 09 08 07 06 05 04 03 02 01 2001 12 11 10 09 08 07 One good site MDN Nelson Minar nelson@monkey.org Blog licensed under a Creative Commons License		URLs about URLs, some answers I posed a question about embedding a subject URL in a request URL using percent encoding. Thank you for all the helpful replies, here's what I learned. First, on the existential question, it seems in practice percent encoding really can create distinct names. Try: http://somebits.com/weblog/ http://somebits.com%2Fweblog/ In reality this is just a quirk of Apache's handling of %2F, but since it's the default behaviour for the #1 web server out there that's a strong example. As for the theory, this wikipedia article claims that percent encoded reserved characters does create distinct names whereas percent encoded unreserved characters is just aliasing the same name. So escaping `/` would make a new name but escaping something like `0` wouldn't. Confusing, huh? As to the practical problem of `PATH_INFO` being unescaped basically everyone told me "yeah, CGI's a hack like that". So going with the hack I'll just use the `REQUEST_URI` variable Apache sets. It's not documented anywhere I can find but it seems to be an unadulterated literal copy of what the client requested, from which I can do careful parsing. For my service clients will need to know to percent-escape any / or ? in their URLs. And I'll just hope nothing else in the network decideds it's OK to unescape things on me. Some other suggested workarounds: length-delimit the subject URL so you know where it ends, have a magic string delimiting the end of the subject URL that you hope doesn't appear in any legitimate subject URL, or put the subject URL at the end of the request URL so that it ends where it ends. Any of these solutions could be made to work, I was just looking for the principle. Thanks to SethG, RyanB, MikeB, GregW, GordonM, and SamR tech 2006-08-23 01:25 Z Nelson's Weblog • tech → ago, bad, bittorrent, blosxom, dotnet, good, hqnx, iphone, mac, phone, photo, python, webservices