A lot has been written about the emergence of the hashbang convention for web applications, URLs like https://twitter.com/#!/nelson that use # anchors to do fancy Javascript things. But most people I've talked to don't know the real purpose of the #! convention, making Google's life easier.

Googlebot is essentially blind to Javascript content. It (mostly) treats web pages as static text and refuses to execute any code on the page. That means that Google is incapable of indexing pages that rely on Javascript to render correctly. Not just fancy AJAX web apps, either. Google can't index web pages that render pages client-side, or customize the display in Javascript, or do much of anything dynamically. My recent wind history project is essentially invisible to Google, for example.

Enter the hashbang convention, well documented by Google. The #! part is incidental; the spec is all about how site designers need to create a whole second set of statically generated HTML pages just for Googlebot, all behind URLs that include _escaped_fragment_ in their paths. There's a 1:1 mapping between the #! URLs for humans and the _escaped_fragment_ URL for robots. My Twitter page at https://twitter.com/#!/nelson, for example, also exists at http://twitter.com?_escaped_fragment_=/nelson. A special page just for bots, which Google dutifully translates to #! URLs for humans. It's all a big, gross, well-intentioned hack to work around a fundamental limitation in Google's indexing technology.

I don't quite understand why Google hasn't tackled this core problem and figured out a way to run Javascript while indexing. It seems bad that Google can't see the web the way users see it. Google has an amazing Javascript engine, of course, and more than enough skilled engineers to apply it to web indexing. It may be a scaling problem; running code for a page would probably increase their indexing workload by 10x to 100x. Or it may simply be that Google feels with its market power they can require every modern web site in the world to build a special version just for them. Hopefully I'm too pessimistic and Google is working on indexing Javscript content already.

tech
  2011-07-18 23:42 Z