Cross-Domain Communication with IFrames

An update in the era of HTML5 (May 6, 2011)

This post has been heavily commented and linked to over the years, and continues to receive a ton of traffic, so I should make it clear that much of this is no longer relevant for modern browsers. On the one hand, they have adjusted and tightened up their security policies, making some of the techniques here no longer relevant. On the other hand, they have introduced technologies that make it easier to do cross-domain communication in the first place.

With modern browsers, you can and should be using postMessage for this purpose.

Library support is now available too. All of the following provide an interface to postMessage where it’s available, but if it’s not, fall back to the primordial techniques described in this article:

Now back to the original post … (from March 31, 2008)

This article explains iframe-to-iframe communication, when the iframes come from different domains. That you can do this effectively is only now becoming apparent to the community, and is now used in production by Google, Facebook, and others, and has powerful implications for the future of Ajax, mashups, and widgets/gadgets. I’ve been investigating the technique and working some demos, introduced in the article.

Background: Cross-Domain Communication

Ironic that in this world of mashups and Ajax, it’s not very easy to do both of them together. Ajax applications run in the browser and such applications were never intended to talk to anything but the server from whence they came. So it’s not easy to mash content from multiple sources, when everything must be squeezed through the originating web server. A few hacks have arisen over the years to deal with this, such as On-Demand Javascript, and the most recent one is a hack involving iframes, which I’ll explain in this article. As we’ll see later, the iframe technique is arguably more secure than On-Demand Javascript, and it’s also better places for communication within the browser, i.e. from one iframe to another.

Related to this article is a demo application and a couple of variants.

The first mention I’ve seen of this hack originated on James Burke’s Tagneto blog in June, 2006, though I’m fairly certain it’s been used in some quarters long before that. It’s now used in production by Google in Mapplets. It’s also used in Shindig for widget-container communication. The technique also happens to be the best way to make safe cross-domain calls from the browser directly to a third-party server, which is why it is employed by Facebook’s new Javascript Client Library.

The Demo

First, let’s see what we can do with this hack.

Demo

In this demo, we have a control on the top-level document affecting something in the iframe and vice-versa. This shows you can run communication in both directions with this technique. Typical of the technique, the communication is between two browser-side components from different domains (as opposed to browser-to-server communication, although there is actually server communication involved in making this happen).

The Laws of Physics: What you can do with IFrames

To understand the hack, we need to understand the “laws of physics” as they apply to iframes and domain policies within the browser. Once you appreciate the constraints in place, the pattern itself becomes trivial. This demo was created to explore and illustrate these constraints, and contains some simple code examples.

Definition I: A “window” refers either to an iframe or the top-level window (i.e. the “main” page). In our model, then, we have a tree-like hierarchy of windows.

Law I: Any window in the hierarchy can get a handle to any other window in the hierarchy. It doesn’t matter where they live within the hierarchy or which domain they come from – with the right commands, a window can always refer to any other window. Parent windows are accessed as “parent”, “parent.parent”, etc., or “top” for the top-level. Child windows are accessed as “window.frames[0]” or “window.frames[name]“. Note in this case that the name is not the iframe’s id, but rather the iframe’s name. (This reflects the legacy nature of all this stuff, relating back to ugly late-90s frames and framesets.) Thus, to get a sibling handle, you might use “parent.frames[1]“.

Law II: Windows can only access each others’ internal state if they belong to the same domain. This rather puts a kibosh on the whole cross-domain cross-iframe thing. All this would be so easy if iframe scripts could talk to each other directly, but that would cause all manner of security shenanigans. HTML 5 does define explicit communication between iframes, but until wide adoption, we have to think harder …

Law III: Any window in the hierarchy can set (but not read) any other window’s location/URL, even though (from Law II) browser security policies prevent different-domain iframes from accessing each other’s internal state. Note: Exact details for this law needs further investigation Again, it doesn’t matter which domain it comes from or its position in the hierarchy. It can always get a handle on another window and can always set the window’s URL, e.g. “parent.frames[1].location.href”. This establishes window URLs as the one type of information on the page which is shared across all windows, regardless of the domain they come from. It seems sensible that a parent can change its child windows’ URLs, BUT not vice-versa; how strange that a child window is allowed to alter its parent’s (or uncle’s, sibling’s, etc.) URLs! The only justification I know of is the old technique of escaping the frame trap, where a website, upon loading, ensures it’s not inside a frame by simply setting the top-level URL – if it’s different to itself – to its own URL. This would then cause the page to reload to its own URL. However, that’s a special case and hardly seems worth justifying this much leeway. So I don’t really know why you can do this, but lucky for us, you can!

Law IV: When you change the URL’s fragment identifier (the bit on the end starting with a #, e.g. http://example.com/blah#fragmentID), the page doesn’t reload. This will already be familiar to you if you’re familiar with another Ajax hack, Unique URLs to allow for bookmarkability and page history. Normally, changing a document’s “href” property causes it to reload, but if you only change the fragment identifier, it doesn’t. You can use this knowledge to change the URL symbolically – in a manner which allows a script to inspect it and make use of it – without causing any noticeable change to the page content.

Exploiting the Laws of Physics for Cross-Domain Fun and Profit – The Cross-Domain Hack (URL Polling version)

The laws above are all we need to get cross-domain communication happening. The technique is simply this, assuming Window A wants to control Window B:

  • Window A changes Window B’s fragment identifier.
  • Window B is polling the fragment identifier and notices the change, and updates itself according to the fragment identifier.

Ta-da!!! That’s the whole thing, in its glorious entirety. Of course, you had to know the laws of physics in order to understand why all this works. It simply relies on the fact that both Window A and Window B have one common piece of state – the URL – and the fact that we can change the URL unintrusively by manipulating only the fragment identifier. For example, in the demo, the iframe’s URL changes to http://ajaxpatterns.org/crossframe/#orange and once the iframe script notices it, it updates the colour.

A few observations:

  • This works in either direction. Parent to child, child to parent. As the demo illustrates.
  • It requires co-operation from both parties; it’s not some magic way to bypass browser security mechanisms. Once Window A changes Window B’s fragment identifier, it’s up to Window B to act on the change; and it’s up to Window B to be polling the fragment identifier in the first place.
  • Polling the fragment identifier happens to be exactly the same technique used in the Unique URLs pattern.

There are a couple of downsides: (a) Polling slows down the whole application; (b) Polling always involves some lag time (and there’s always a trade-off a and b – the faster the response, the more cycles you application uses up); (c) The URL visibly changes (assuming you want to manipulate the top-level window). We’ll now consider a second technique that addresses these (albeit in a way that introduces a different downside).

The Cross-Domain Hack (Marathon version)

Here’s a variant which no longer involves polling or changing any URLs. I learned of it from Juliene Le Comte’s blog, and he’s even packaged it as a library.

Looking back at Law II: “Windows can only access each others’ internal state if they belong to the same domain”. At the time, I made this sound like a bad thing, but as David Brent likes to say, “So, you know, every cloud …”. The law is bad if you state it as “cross-domain iframes can’t play with each others’ toys” (paraphrasing the informal version of Demeter’s law). But it’s good if you spin it as “well, at least same-domain iframes can play with each others’ toys”. That’s what we’re going to exploit here.

As for the demo, the functionality is the same, but since this one involves spawning iframes, I’ve left them intact, and made them visible, for your viewing delight. Normally, of course, they’d be invisible, and the application would look exactly the same as the previous demo.

Here’s how this technique works:

  • Every time Window A wants to call Window B, it spawns a child iframe, “Window B2″ in the same domain as Window B. The URL includes the command being issued (as a CGI parameter, fragment identifier, or any other URL pattern which will be recognised by the destination script).
  • When window B2 starts up, its Javascript inspects the URL, gets a handle on Window B, and updates Window B according to the URL (e.g. a CGI parameter).
  • Window B2 destroys itself in a puff of self-gratified logic.

So in this case, we create a new, short-lived, iframe for every message being passed. Because the iframe comes from the same domain as the window we’re trying to update, it’s allowed to change the window’s internal state. It’s only useful to us on startup, because after that we can no longer communicate with it (apart from by the previous fragment identifier trick, but we could do that directly on the original window).

Window B2 is sometimes called a proxy because it accepts commands from Window A and passes them to Window B. I like to think of it as Pheidippides of fame; it passes on a message and then undergoes a noble expiration. Its whole mission in life is to deliver that one message.

This technique comes with its own downside too. Quite obviously, the downside is that you must create a new iframe for every call, which requires a trip to the server. However, with caching in place, that could be avoided, since everything that must happen will happen inside the browser. So it would simply be the processing expense of creating and deleting an iframe element. Note that the previous variant never changed the DOM structure or invoked the server.

Also, note that in either versions of the hack, there is the awkward matter of having to express the request in string form, since in either pattern, you are required to embed the request on the window URL. There is an inspired extension of this hack that also has some untapped promise in this area. It involves setting up a subdomain and updating its DNS to point to a third-party website. When combined with the old document.domain hack, you end up with a situation where your iframe can communicate with a cross-domain iframe, without relying on iframe. (The technique described in the article is about browser-to-server communication, but I believe this iframe-to-iframe is possible too.)

A Third Hack Emerges: Window Size Monitoring

A newer third hack by Piers Lawson is based around the porous nature of window sizes and the use of window.resize(). Fragment IDs are used like in the first technique here, but instead of polling, window resize events are used to cause a more direct trigger.

Applications

Cross-Domain IFrame-to-IFrame Calls … and Widgets/Gadgets

In the world of mashups, iframes are a straightforward way to syndicate content from one place to another. The problem, though, is limited interaction between iframes; in pure form, you end up with a few mini web browsers on a single page. It gets better when the iframes can communicate with each other. For example, you can imagine having iGoogle open, with a contacts widget and a map widget. Clicking on a contact, the map widget notices and focuses on the contact’s location. This is possible via Gadget-To-Gadget communication, a form of publish-subscribe which works on the iframe hack described here. And speaking of maps, check out Google Mapplets, which are a special form of gadget that work on Google Maps, and also rely on this technique.

In terms of gadgets, another application is communication between a gadget and its container, and this is something I’ve been looking at wrt Shindig. For example, there is a dynamic-height feature gadgets can declare. This gives the gadget developer an API to say “I’ve updated, now please change my height”. Well, an iframe can’t change its own height; it must tell its parent to do that. And since the gadget lives in an iframe, on a different domain as the container (e.g. iGoogle), this requires a cross-domain, cross-iframe, message. And so, it uses this technique (“rpc” – remote procedural call – in shindig terminology) to pass a message to the container.

Cross-Domain Browser-to-Server Calls

The best known technique for calls from the browser to an third-party server is On-Demand Javascript, aka Javascript APIs aka JSON/JSONP. This was obscure in 2005, with Delicious being the best example. Now, it’s big time in Web 2.0 API land, and Yahoo! has exposed almost all of its APIs this way, and Google also provides data such as RSS content via JSON.

It works by spawning a new script element programmatically, an element pointing to an external Javascript URL. Since there’s no restriction on third-party Javascript running, the browser will faithfully fetch and execute the script, and so the script will typically be written to “return” by updating variable and/or calling an event handler.

There are two major security issues with On-Demand Javascript. Firstly, you have to trust the API provider (e.g. Yahoo!) a lot because you are letting them run a script on your own web page. And there’s no way to sanitise it, due to the script tag mechanism involved. If they are malicious or downright stoopid, your users may end up running an evil script which could ask for their password, send their data somewhere, or destroy their data altogether. That’s because whatever your web app can do, so can the third party’s script, even if you’re only trying to get a simple value back. The mechanism forces you to hand over the keys to the Ferrari when all you want is a new bumper sticker. Secondly, what’s to stop other websites also making use of the external Javascript? If your own site can embed the script to call a third-party JS API, so too can a malicious fourth-party. This is fine for public, read-only, data, but what if you’re relying on the user’s cookies to make privileged calls to the third-party? Then the fourth-party’s web app will be just as capable of issuing calls from the browser to the third-party, and they might well be more evil calls than you’re making, e.g. “transferFunds()” instead of “getBankBalance()”. The moral is: Javascript APIs can only be used for serving public data.

Whoa!!! Public data only? That’s a tragic restriction on our cross-domain API! For mashups to be truly useful, it must be personal. We’ll increasingly have OAuth-style APIs where users will tell Site X that Site Y is allowed to read its data. But how can that work in the browser? How can Site Y expose its data so that it’s usable from the browser, but only when Site X is running? It can’t work with On-Demand Javascript. Site Y could try reading the referrer headers to see where the call is coming from, but anyone could write a command-line client with fake headers.

In fact, the answer is to use the iframe hack described in this article. As I mentioned earlier, this is how Facebook gets the job done, with what is essentially the same “power of attorney” delegation model as OAuth (BTW thanks to my colleague Jeremy Ruston for the “power of attorney” OAuth analogy – albeit it was stated in a slightly different context from OAuth).

I haven’t looked too much into the mechanism involved with the Facebook API, but it looks like it’s essentially using a variant of the Marathon technique. From memory, there’s an ever-present invisible facebook.com iframe. Each time your web app make a Facebook call, the Facebook JS library spawns a new “proxy” iframe, which passes the message on to its same-domain ever-present frame, which makes a bog-standard XHR call to Facebook. So now we’re making an XHR call to another domain, which we can get away with because it’s coming from a separate iframe. Once the XHR call returns, I think the message is returned to your application (this happens via another same-domain iframe you must host on your server, though I think that’s unnecessary) and the proxy iframe disappears.

Note that all Facebook ever exposes is a standard web service that relies on the user being logged into Facebook – there’s no Javascript involved. The user must be logged in and must have given permission for the application to access its Facebook details. Effectively, the user is allowing a particular website URL to make Facebook calls, since the application developer must register the URL. If you look back at the iframe algorithms I described earlier, you’ll see that it’s straightforward for Facebook to ensure that only this application (and any other application the user trusts) can access the data. The Facebook.com iframe (whose behaviour is controlled by IFrame and can’t be tampered) simply has to inspect the URL of the parent window and pass it to the server as part of the XHR call. The server can then check that the logged-in user has authorised this application, using the URL to identify it.

As for the first concern of cross-domain Javascript – having to trust the third-party API provider – I believe the iframe technique overcomes this concern too. Facebook.com never gets to run arbitrary Javascript on your server. Of course, you have to trust the Facebook library and the Facebook-provided iframe you’re required to host on your domain, but those could be audited prior to installation. All those things are set up to do is call callback methods inside your top-level application; you could inspect the library and ensure that’s all that will ever happen.

Thus, cross-domain iframe-based communication solves both problems which have plagued On-Demand Javascript. It is slightly more complicated, however.

Conclusions

Yes, this is a somewhat complicated technique. Actually understanding the problem it solves is really the hard part! Once you understand that, and once you understand those laws of physics, the trick is actually quite straightforward (either version of it).

The technique will be critical for gadget containers such as Shindig. As OAuth takes off, we’ll also see the technique used a lot more in mainstream applications and APIs.

With HTML 5, cross-frame messaging will render the hack unnecessary for iframe-to-iframe communication. Indeed, the aforementioned Cross-Domain library uses that technique already for Opera, in a fortuitous twist of fate since Opera doesn’t actually support everything this hack requires. However, the notion of using iframes for cross-domain calls will still be present, no matter how the windows talk to each other.

Dogfooding considered solipsistic

Jeff Attwood encourages developers to eat their own dogfood:

I’ve found that much of the best software is the best because the programmers are the users, too. It is UsWare. It behooves software developers to understand users, to walk a mile in their shoes. If we can bridge the gap between users and ourselves– even if only a little– we start slowly converting our mediocre ThemWare into vastly superior UsWare. To really care about the software you’re writing, you have to become a user, at least in spirit.

This is good if you’re writing an IDE, not so good if you’re writing a word processor and actually kind of plain wrong if you’re writing a medical or trading app where developers lack the knowledge and experience of typical users. As Jeff himself has noted elsewhere, “If the application you’re writing isn’t intended for expert users, having the developers dogfood it won’t necessarily buy you much, because developers are highly unrepresentative of typical users. Beyond fixing critical bugs, it could even hurt the application: developers tend to add advanced, complicated features that are useful to them.”

At this point, I’m supposed to launch into a tirade about how the inmates are running the asylum, but I actually think that’s a pretty condescending attitude, albeit one that has includes some arguments to be aware of. It’s also somewhat of a self-fulfilling prophecy. Professional software developers have an important role to play in working towards usability, especially in the very real situation where usability professionals are simply not available.

How do you improve usability when you can’t eat your own dogfood? The solution is to become an anthopologist and watch actual users eat your dogfood. There are a lot of variants of this:

  • Paper prototyping. For early design feedback. Incidentally, paper prototyping is a lot more than just lo-fi, hand-drawn, sketches. “Real” paper prototypes are interactive, like a board game, with a little real-time prodding from the researcher and possibly an assistant domain expert.
  • Contextual analysis. This goes by several names, but the idea is to sit in the real-world environment of users and see how they interact with the system, as well as other systems, other people, and the general environment. The only problem with this approach is that you can’t do it with an in-progress system; it either has to be the old, extant, system, or the new, almost-complete, system.
  • Interviews and walkthroughs. The user sits in a “usability lab” (but probably just an office) and runs a test version of the system, discussing it with the researcher. It may also be that there’s no system involved, just a qualitative discussion.
  • Automated monitoring. A fan favourite of privacy advocates everywhere ;) – so please inform users what you’re monitoring, let them opt out, etc. This involves logging and monitoring of user actions for later monitoring. Thanks to the magic of Ajax, there are now some very powerful tools that will track all activity in the browser and play it back for later analysis. For example, Scrutinizer, written by the talented Nitobi team. Scrutinizer is cool because it actually blurs everywhere outside the mouse pointer area, so is an okay approximation for a full-blown eye tracking system. I haven’t heard of biological testing – especially PET scans – but with growing inteterest in Neuroeconomics, I’m sure it’s coming.

A related problem of dogfooding is that software developers tend to enjoy writing systems for themselves, i.e. those where they really will be eating their own dogfood. This is why we have: a proliferation of IDEs and clones of the veritable Unix “make” utility; quite an unreasonable number of applications about Klingon and juggling; but far fewer about quilt making and the art of kite surfing.

All this goes a long way to explaining why the greatest usability is often seen in software related products. I rate IntelliJ Idea, Firebug, and the general Unix philosophy among the best exemplars of software usability ever created, in any domain. Mainstream business and scientific applications will never be as good as those applications through the application of dogfooding.