752

I've just noticed that the long, convoluted Facebook URLs that we're used to now look like this:

http://www.facebook.com/example.profile#!/pages/Another-Page/123456789012345

As far as I can recall, earlier this year it was just a normal URL-fragment-like string (starting with #), without the exclamation mark. But now it's a shebang or hashbang (#!), which I've previously only seen in shell scripts and Perl scripts.

The new Twitter URLs now also feature the #! symbols. A Twitter profile URL, for example, now looks like this:

http://twitter.com/#!/BoltClock

Does #! now play some special role in URLs, like for a certain Ajax framework or something since the new Facebook and Twitter interfaces are now largely Ajaxified?
Would using this in my URLs benefit my Web application in any way?

Arsen Khachaturyan
  • 6,472
  • 4
  • 32
  • 36
BoltClock
  • 630,065
  • 150
  • 1,295
  • 1,284
  • 133
    Hmm. Had to look up what `shebang` was... http://en.wikipedia.org/wiki/Shebang_%28Unix%29 – JYelton Jun 09 '10 at 19:57
  • 1
    Which is why I'm puzzled as to what it's doing in a Facebook URL. – BoltClock Jun 09 '10 at 19:59
  • 35
    FWIW, it's not just shell and perl scripts, but any script run on a unix like system. The #! line tells the shell what the interpreter for that script is... of course, my comment has nothing to do with facebook or twitter – bluesmoon Oct 16 '10 at 22:57
  • 3
    [Thanks, Hacker News!](http://news.ycombinator.com/item?id=1798891) (leaving as a comment so I don't bump my question, don't see the need to) – BoltClock Oct 17 '10 at 06:07
  • 16
    The hashbang is glorified for all the wrong reasons, it breaks best practices and destroys the chance for progressive enhancement and graceful degradation. [Please use the other solutions out there.](https://github.com/balupton/history.js/wiki/Intelligent-State-Handling) – balupton Mar 07 '11 at 18:47
  • 2
    Note that per october 2015 Google [deprecated the hashbang](http://googlewebmastercentral.blogspot.nl/2015/10/deprecating-our-ajax-crawling-scheme.html) they introduced [in 2009](https://developers.google.com/webmasters/ajax-crawling/docs/specification)! So for new applications you no longer should have to do this for SEO. Right now there's only a subtle remark in white at the top of Google's spec pages: "This recommendation is officially deprecated as of October 2015." – Bart Nov 14 '15 at 08:23

7 Answers7

486

This technique is now deprecated.

This used to tell Google how to index the page.

https://developers.google.com/webmasters/ajax-crawling/

This technique has mostly been supplanted by the ability to use the JavaScript History API that was introduced alongside HTML5. For a URL like www.example.com/ajax.html#!key=value, Google will check the URL www.example.com/ajax.html?_escaped_fragment_=key=value to fetch a non-AJAX version of the contents.

ceejayoz
  • 165,698
  • 38
  • 268
  • 341
  • Ah, not sure how I missed that; looks like it's been around for quite a bit. Thanks! – BoltClock Jun 09 '10 at 21:33
  • 18
    Are you sure that is all there is to it? I often find that the page loading hangs on a shebang URL on facebook (even after many reloads), but if you manually remove the #!, it works. **Not to mention** you often get "1.5 URLs" (i.e. the old URL remains, and just has the new part added to it (i.e. photo.php?id=... twice, but with different ids). **Not to mention** that "#!" is also added to facebook-mail URLs, which probably aren't (and shouldn't be) indexable. In any case I find the shebang *extremely* annoying since it seems to be the reason for so many page faults on my slow home line. – Pedery Oct 15 '10 at 03:15
  • 11
    That Facebook has bugs doesn't make those bugs the fault of two characters in the URL. If the site is coded properly to understand and generate them, crawlable AJAX URLs are quite handy. Lots of other things on Facebook glitch out, too. – ceejayoz Oct 15 '10 at 03:19
  • 15
    @Pedery: I have only ever seen that issue with Facebook. I agree, it drives me up the (non-Facebook) wall all the time. – BoltClock Oct 15 '10 at 03:22
  • The old URL often remains because Facebook handles the initial request at that URL (i.e. a photo), but subsequent navigation is handled on that same page via AJAX. So, you might be viewing a profile on a page with the URL of `photo.php`, but that's becuase you clicked around. – ceejayoz Oct 15 '10 at 03:22
  • 5
    As for search engines, having an indexable AJAX URL doesn't make the page get indexed anymore than having an indexable **non** AJAX URL does. Facebook uses this URL format for more than just Google's benefit - it also makes pages accessed via AJAX on Facebook bookmarkable when they otherwise wouldn't be. – ceejayoz Oct 15 '10 at 03:24
  • 1
    Escaped fragments are a great idea, (see http://mambopics.com) but until at least Bing (and Facebook - http://forum.developers.facebook.net/viewtopic.php?id=63698) implements it, I think there will be lower adoption since everyone will have to do some sort of dual URL system; a hash-bang url for Google and another one for Bing (and others). – Amir Oct 16 '10 at 22:40
  • 13
    For some interesting caveats, also read this article: http://www.isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs – Michael Stum Feb 13 '11 at 01:22
  • 4
    The hashbang is glorified for all the wrong reasons, it breaks best practices and destroys the chance for progressive enhancement and graceful degradation. [Please use the other solutions out there.](https://github.com/balupton/history.js/wiki/Intelligent-State-Handling) – balupton Mar 07 '11 at 18:46
  • As the top + accepted answer, I think it would be worthwhile to update it with something more than just a link. – dayuloli Jan 16 '15 at 09:16
  • 1
    As of October 14, 2015, Google has deprecated the technique: http://googlewebmastercentral.blogspot.com/2015/10/deprecating-our-ajax-crawling-scheme.html – Trenton Dec 14 '15 at 23:54
219

The octothorpe/number-sign/hashmark has a special significance in an URL, it normally identifies the name of a section of a document. The precise term is that the text following the hash is the anchor portion of an URL. If you use Wikipedia, you will see that most pages have a table of contents and you can jump to sections within the document with an anchor, such as:

https://en.wikipedia.org/wiki/Alan_Turing#Early_computers_and_the_Turing_test

https://en.wikipedia.org/wiki/Alan_Turing identifies the page and Early_computers_and_the_Turing_test is the anchor. The reason that Facebook and other Javascript-driven applications (like my own Wood & Stones) use anchors is that they want to make pages bookmarkable (as suggested by a comment on that answer) or support the back button without reloading the entire page from the server.

In order to support bookmarking and the back button, you need to change the URL. However, if you change the page portion (with something like window.location = 'http://raganwald.com';) to a different URL or without specifying an anchor, the browser will load the entire page from the URL. Try this in Firebug or Safari's Javascript console. Load http://minimal-github.gilesb.com/raganwald. Now in the Javascript console, type:

window.location = 'http://minimal-github.gilesb.com/raganwald';

You will see the page refresh from the server. Now type:

window.location = 'http://minimal-github.gilesb.com/raganwald#try_this';

Aha! No page refresh! Type:

window.location = 'http://minimal-github.gilesb.com/raganwald#and_this';

Still no refresh. Use the back button to see that these URLs are in the browser history. The browser notices that we are on the same page but just changing the anchor, so it doesn't reload. Thanks to this behaviour, we can have a single Javascript application that appears to the browser to be on one 'page' but to have many bookmarkable sections that respect the back button. The application must change the anchor when a user enters different 'states', and likewise if a user uses the back button or a bookmark or a link to load the application with an anchor included, the application must restore the appropriate state.

So there you have it: Anchors provide Javascript programmers with a mechanism for making bookmarkable, indexable, and back-button-friendly applications. This technique has a name: It is a Single Page Interface.

p.s. There is a fourth benefit to this technique: Loading page content through AJAX and then injecting it into the current DOM can be much faster than loading a new page. In addition to the speed increase, further tricks like loading certain portions in the background can be performed under the programmer's control.

p.p.s. Given all of that, the 'bang' or exclamation mark is a further hint to Google's web crawler that the exact same page can be loaded from the server at a slightly different URL. See Ajax Crawling. Another technique is to make each link point to a server-accessible URL and then use unobtrusive Javascript to change it into an SPI with an anchor.

Here's the key link again: The Single Page Interface Manifesto

Krenair
  • 534
  • 5
  • 18
raganwald
  • 2,647
  • 1
  • 14
  • 8
  • 14
    "However an application without this optimization is still crawlable if the web crawler wishes to index it." Not really. The hash doesn't get sent to the server. – Chris Broadfoot Oct 17 '10 at 02:58
  • 7
    just for information: `self.document.location.hash` provides the value of this hash – Kevin Oct 17 '10 at 10:19
  • 12
    *The hash doesn't get sent to the server.* Good catch! – raganwald Oct 17 '10 at 12:08
  • 1
    The hash getting sent to the server is sadly client independent. Granted google may not _currently_ send the hash to the server, and I'd be surprised if it did, but some things ( and some clients ) may send the # part anyway. ( Just like some web-clients will resolve the "../../" part of urls client-side, and others will ship them to the webserver as-is ) – Kent Fredric Oct 27 '10 at 09:47
  • and even though the hash may not be sent to the server, that doesn't stop google from indexing it. If it sees `baz` , it might associate "baz" with "foobar" and only follow the "foobar" part, but there's still no reason it *cant* record the "#tag" "baz" association, or present that data in published links and search results. It may make assumptions that "foobar" and "foobar#tag" are equivalent, ie: not index the true content of "foobar#tag", but it doesn't stop it being *useful* – Kent Fredric Oct 27 '10 at 09:50
  • 37
    This entire answer aside from the single-paragraph "pps" is redundant. – Lightness Races in Orbit Jan 03 '11 at 18:56
  • 3
    @TomalakGeret'kal I don't think so, I think for someone who wants to know how and why (including me), its a perfectly crafted answer. Your comment adds no value to this answer as well. – dsignr Dec 28 '11 at 05:39
  • 21
    @imaginonic: I'm late, but as perfectly crafted as it is, 90% of it doesn't touch on the `#!` aspect of my question **at all**. That's why he said it's redundant. The number of upvotes here is likely due to the high traffic when my question made it to Hacker News coupled with the sheer length alone of this answer. – BoltClock Feb 21 '12 at 18:58
  • To be precise, the entity that the `#` in the URL references is an HTML tag's `ID` attribute. The name `ID` refers to the fact that ID's are unique, meaning there is (supposed to) only be one tag with that `ID` on the entire website, so a URL with a hashbang (`#!some_page`) is as unique as a URL without a hashbang. – trysis Feb 24 '14 at 19:44
  • You basically just explain what the anchor tags for URL are for. – mr5 May 29 '19 at 10:27
113

First of all: I'm the author of the The Single Page Interface Manifesto cited by raganwald

As raganwald has explained very well, the most important aspect of the Single Page Interface (SPI) approach used in FaceBook and Twitter is the use of hash # in URLs

The character ! is added only for Google purposes, this notation is a Google "standard" for crawling web sites intensive on AJAX (in the extreme Single Page Interface web sites). When Google's crawler finds an URL with #! it knows that an alternative conventional URL exists providing the same page "state" but in this case on load time.

In spite of #! combination is very interesting for SEO, is only supported by Google (as far I know), with some JavaScript tricks you can build SPI web sites SEO compatible for any web crawler (Yahoo, Bing...).

The SPI Manifesto and demos do not use Google's format of ! in hashes, this notation could be easily added and SPI crawling could be even easier (UPDATE: now ! notation is used and remains compatible with other search engines).

Take a look to this tutorial, is an example of a simple ItsNat SPI site but you can pick some ideas for other frameworks, this example is SEO compatible for any web crawler.

The hard problem is to generate any (or selected) "AJAX page state" as plain HTML for SEO, in ItsNat is very easy and automatic, the same site is in the same time SPI or page based for SEO (or when JavaScript is disabled for accessibility). With other web frameworks you can ever follow the double site approach, one site is SPI based and another page based for SEO, for instance Twitter uses this "double site" technique.

Shoe
  • 70,092
  • 30
  • 150
  • 251
jmarranz
  • 6,381
  • 2
  • 18
  • 10
  • 3
    What about progressive enhancement principle? Website shouldn't crash of fail due to disabled JavaScript. And trust me, javascript is disabled not just in outdated browsers but also by many security aware users who do not like executing random JS. – Roman Royter Mar 29 '11 at 04:05
91

I would be very careful if you are considering adopting this hashbang convention.

Once you hashbang, you can’t go back. This is probably the stickiest issue. Ben’s post put forward the point that when pushState is more widely adopted then we can leave hashbangs behind and return to traditional URLs. Well, fact is, you can’t. Earlier I stated that URLs are forever, they get indexed and archived and generally kept around. To add to that, cool URLs don’t change. We don’t want to disconnect ourselves from all the valuable links to our content. If you’ve implemented hashbang URLs at any point then want to change them without breaking links the only way you can do it is by running some JavaScript on the root document of your domain. Forever. It’s in no way temporary, you are stuck with it.

You really want to use pushState instead of hashbangs, because making your URLs ugly and possibly broken -- forever -- is a colossal and permanent downside to hashbangs.

Jeff Atwood
  • 60,897
  • 45
  • 146
  • 152
  • I think your criticism of hashbangs is valid, but using just pushState as a substitute means that we would lose the ability to load content within a single page app based on the URL. So then URLs can't be shared. – Luke Jul 23 '14 at 22:30
  • 1
    I had a similar issue in my work - we've taken to using Page.js (which uses pushState) for single-page navigation, where previously we used Hasher and Crossroads (hash-bashed). As a result, we needed to rescue paths like `/blah#foo/feep/baz?stuff=nonsense`. The new path equivalent would be `/blah/foo/feep/baz?stuff=nonsense` (note # replaced by /). I did that simply by having a route in my setup that caught `/blah`and checked if it had a has, if so, appending that hash's content after a slash. Rescued. – Gert Sønderby Sep 08 '15 at 07:36
16

To have a good follow-up about all this, Twitter - one of the pioneers of hashbang URL's and single-page-interface - admitted that the hashbang system was slow in the long run and that they have actually started reversing the decision and returning to old-school links.

Article about this is here.

M--
  • 18,939
  • 7
  • 44
  • 76
kingmaple
  • 3,902
  • 5
  • 27
  • 42
9

I always assumed the ! just indicated that the hash fragment that followed corresponded to a URL, with ! taking the place of the site root or domain. It could be anything, in theory, but it seems the Google AJAX Crawling API likes it this way.

The hash, of course, just indicates that no real page reload is occurring, so yes, it’s for AJAX purposes. Edit: Raganwald does a lovely job explaining this in more detail.

BoltClock
  • 630,065
  • 150
  • 1,295
  • 1,284
Alan H.
  • 15,001
  • 14
  • 74
  • 104
-1

Answers above describe well why and how it is used on twitter and facebook, what I missed is explanation what # does by default...

On a 'normal' (not a single page application) you can do anchoring with hash to any element that has id by placing that elements id in url after hash #

Example:

(on Chrome) Click F12 or Rihgt Mouse and Inspect element

enter image description here

then take id="answer-10831233" and add to url like following

https://stackoverflow.com/questions/3009380/whats-the-shebang-hashbang-in-facebook-and-new-twitter-urls-for#answer-10831233

and you will get a link that jumps to that element on the page

What's the shebang/hashbang (#!) in Facebook and new Twitter URLs for?

By using # in a way described in the answers above you are introducing conflicting behaviour... although I wouldn't loose sleep over it... since Angular it became somewhat of a standard....

Matas Vaitkevicius
  • 49,230
  • 25
  • 212
  • 228
  • 2
    raganwald's answer contains the explanation you said you missed. Even so, I don't see how the question benefits from a tutorial on how # works - the question assumes the reader is already familiar with URL fragments, *and* that functionality isn't really relevant here anyway, except for your remark about conflicting behavior. – BoltClock Jul 12 '17 at 03:49
  • @BoltClock Hi BoltClock, but without explaining what is default behaviour saying that 'it will conflict' does not give reader any idea what's at stake, what sort of functionality is potentially being lost... I just like to give nice answers with pictures if I see that something is missing that are as complete as I can make them... – Matas Vaitkevicius Jul 16 '17 at 02:43