Cross-Domain Communication with IFrames

An update in the era of HTML5 (May 6, 2011)

This post has been heavily commented and linked to over the years, and continues to receive a ton of traffic, so I should make it clear that much of this is no longer relevant for modern browsers. On the one hand, they have adjusted and tightened up their security policies, making some of the techniques here no longer relevant. On the other hand, they have introduced technologies that make it easier to do cross-domain communication in the first place.

With modern browsers, you can and should be using postMessage for this purpose.

Library support is now available too. All of the following provide an interface to postMessage where it’s available, but if it’s not, fall back to the primordial techniques described in this article:

Now back to the original post … (from March 31, 2008)

This article explains iframe-to-iframe communication, when the iframes come from different domains. That you can do this effectively is only now becoming apparent to the community, and is now used in production by Google, Facebook, and others, and has powerful implications for the future of Ajax, mashups, and widgets/gadgets. I’ve been investigating the technique and working some demos, introduced in the article.

Background: Cross-Domain Communication

Ironic that in this world of mashups and Ajax, it’s not very easy to do both of them together. Ajax applications run in the browser and such applications were never intended to talk to anything but the server from whence they came. So it’s not easy to mash content from multiple sources, when everything must be squeezed through the originating web server. A few hacks have arisen over the years to deal with this, such as On-Demand Javascript, and the most recent one is a hack involving iframes, which I’ll explain in this article. As we’ll see later, the iframe technique is arguably more secure than On-Demand Javascript, and it’s also better places for communication within the browser, i.e. from one iframe to another.

Related to this article is a demo application and a couple of variants.

The first mention I’ve seen of this hack originated on James Burke’s Tagneto blog in June, 2006, though I’m fairly certain it’s been used in some quarters long before that. It’s now used in production by Google in Mapplets. It’s also used in Shindig for widget-container communication. The technique also happens to be the best way to make safe cross-domain calls from the browser directly to a third-party server, which is why it is employed by Facebook’s new Javascript Client Library.

The Demo

First, let’s see what we can do with this hack.

Demo

In this demo, we have a control on the top-level document affecting something in the iframe and vice-versa. This shows you can run communication in both directions with this technique. Typical of the technique, the communication is between two browser-side components from different domains (as opposed to browser-to-server communication, although there is actually server communication involved in making this happen).

The Laws of Physics: What you can do with IFrames

To understand the hack, we need to understand the “laws of physics” as they apply to iframes and domain policies within the browser. Once you appreciate the constraints in place, the pattern itself becomes trivial. This demo was created to explore and illustrate these constraints, and contains some simple code examples.

Definition I: A “window” refers either to an iframe or the top-level window (i.e. the “main” page). In our model, then, we have a tree-like hierarchy of windows.

Law I: Any window in the hierarchy can get a handle to any other window in the hierarchy. It doesn’t matter where they live within the hierarchy or which domain they come from – with the right commands, a window can always refer to any other window. Parent windows are accessed as “parent”, “parent.parent”, etc., or “top” for the top-level. Child windows are accessed as “window.frames[0]” or “window.frames[name]“. Note in this case that the name is not the iframe’s id, but rather the iframe’s name. (This reflects the legacy nature of all this stuff, relating back to ugly late-90s frames and framesets.) Thus, to get a sibling handle, you might use “parent.frames[1]“.

Law II: Windows can only access each others’ internal state if they belong to the same domain. This rather puts a kibosh on the whole cross-domain cross-iframe thing. All this would be so easy if iframe scripts could talk to each other directly, but that would cause all manner of security shenanigans. HTML 5 does define explicit communication between iframes, but until wide adoption, we have to think harder …

Law III: Any window in the hierarchy can set (but not read) any other window’s location/URL, even though (from Law II) browser security policies prevent different-domain iframes from accessing each other’s internal state. Note: Exact details for this law needs further investigation Again, it doesn’t matter which domain it comes from or its position in the hierarchy. It can always get a handle on another window and can always set the window’s URL, e.g. “parent.frames[1].location.href”. This establishes window URLs as the one type of information on the page which is shared across all windows, regardless of the domain they come from. It seems sensible that a parent can change its child windows’ URLs, BUT not vice-versa; how strange that a child window is allowed to alter its parent’s (or uncle’s, sibling’s, etc.) URLs! The only justification I know of is the old technique of escaping the frame trap, where a website, upon loading, ensures it’s not inside a frame by simply setting the top-level URL – if it’s different to itself – to its own URL. This would then cause the page to reload to its own URL. However, that’s a special case and hardly seems worth justifying this much leeway. So I don’t really know why you can do this, but lucky for us, you can!

Law IV: When you change the URL’s fragment identifier (the bit on the end starting with a #, e.g. http://example.com/blah#fragmentID), the page doesn’t reload. This will already be familiar to you if you’re familiar with another Ajax hack, Unique URLs to allow for bookmarkability and page history. Normally, changing a document’s “href” property causes it to reload, but if you only change the fragment identifier, it doesn’t. You can use this knowledge to change the URL symbolically – in a manner which allows a script to inspect it and make use of it – without causing any noticeable change to the page content.

Exploiting the Laws of Physics for Cross-Domain Fun and Profit – The Cross-Domain Hack (URL Polling version)

The laws above are all we need to get cross-domain communication happening. The technique is simply this, assuming Window A wants to control Window B:

  • Window A changes Window B’s fragment identifier.
  • Window B is polling the fragment identifier and notices the change, and updates itself according to the fragment identifier.

Ta-da!!! That’s the whole thing, in its glorious entirety. Of course, you had to know the laws of physics in order to understand why all this works. It simply relies on the fact that both Window A and Window B have one common piece of state – the URL – and the fact that we can change the URL unintrusively by manipulating only the fragment identifier. For example, in the demo, the iframe’s URL changes to http://ajaxpatterns.org/crossframe/#orange and once the iframe script notices it, it updates the colour.

A few observations:

  • This works in either direction. Parent to child, child to parent. As the demo illustrates.
  • It requires co-operation from both parties; it’s not some magic way to bypass browser security mechanisms. Once Window A changes Window B’s fragment identifier, it’s up to Window B to act on the change; and it’s up to Window B to be polling the fragment identifier in the first place.
  • Polling the fragment identifier happens to be exactly the same technique used in the Unique URLs pattern.

There are a couple of downsides: (a) Polling slows down the whole application; (b) Polling always involves some lag time (and there’s always a trade-off a and b – the faster the response, the more cycles you application uses up); (c) The URL visibly changes (assuming you want to manipulate the top-level window). We’ll now consider a second technique that addresses these (albeit in a way that introduces a different downside).

The Cross-Domain Hack (Marathon version)

Here’s a variant which no longer involves polling or changing any URLs. I learned of it from Juliene Le Comte’s blog, and he’s even packaged it as a library.

Looking back at Law II: “Windows can only access each others’ internal state if they belong to the same domain”. At the time, I made this sound like a bad thing, but as David Brent likes to say, “So, you know, every cloud …”. The law is bad if you state it as “cross-domain iframes can’t play with each others’ toys” (paraphrasing the informal version of Demeter’s law). But it’s good if you spin it as “well, at least same-domain iframes can play with each others’ toys”. That’s what we’re going to exploit here.

As for the demo, the functionality is the same, but since this one involves spawning iframes, I’ve left them intact, and made them visible, for your viewing delight. Normally, of course, they’d be invisible, and the application would look exactly the same as the previous demo.

Here’s how this technique works:

  • Every time Window A wants to call Window B, it spawns a child iframe, “Window B2″ in the same domain as Window B. The URL includes the command being issued (as a CGI parameter, fragment identifier, or any other URL pattern which will be recognised by the destination script).
  • When window B2 starts up, its Javascript inspects the URL, gets a handle on Window B, and updates Window B according to the URL (e.g. a CGI parameter).
  • Window B2 destroys itself in a puff of self-gratified logic.

So in this case, we create a new, short-lived, iframe for every message being passed. Because the iframe comes from the same domain as the window we’re trying to update, it’s allowed to change the window’s internal state. It’s only useful to us on startup, because after that we can no longer communicate with it (apart from by the previous fragment identifier trick, but we could do that directly on the original window).

Window B2 is sometimes called a proxy because it accepts commands from Window A and passes them to Window B. I like to think of it as Pheidippides of fame; it passes on a message and then undergoes a noble expiration. Its whole mission in life is to deliver that one message.

This technique comes with its own downside too. Quite obviously, the downside is that you must create a new iframe for every call, which requires a trip to the server. However, with caching in place, that could be avoided, since everything that must happen will happen inside the browser. So it would simply be the processing expense of creating and deleting an iframe element. Note that the previous variant never changed the DOM structure or invoked the server.

Also, note that in either versions of the hack, there is the awkward matter of having to express the request in string form, since in either pattern, you are required to embed the request on the window URL. There is an inspired extension of this hack that also has some untapped promise in this area. It involves setting up a subdomain and updating its DNS to point to a third-party website. When combined with the old document.domain hack, you end up with a situation where your iframe can communicate with a cross-domain iframe, without relying on iframe. (The technique described in the article is about browser-to-server communication, but I believe this iframe-to-iframe is possible too.)

A Third Hack Emerges: Window Size Monitoring

A newer third hack by Piers Lawson is based around the porous nature of window sizes and the use of window.resize(). Fragment IDs are used like in the first technique here, but instead of polling, window resize events are used to cause a more direct trigger.

Applications

Cross-Domain IFrame-to-IFrame Calls … and Widgets/Gadgets

In the world of mashups, iframes are a straightforward way to syndicate content from one place to another. The problem, though, is limited interaction between iframes; in pure form, you end up with a few mini web browsers on a single page. It gets better when the iframes can communicate with each other. For example, you can imagine having iGoogle open, with a contacts widget and a map widget. Clicking on a contact, the map widget notices and focuses on the contact’s location. This is possible via Gadget-To-Gadget communication, a form of publish-subscribe which works on the iframe hack described here. And speaking of maps, check out Google Mapplets, which are a special form of gadget that work on Google Maps, and also rely on this technique.

In terms of gadgets, another application is communication between a gadget and its container, and this is something I’ve been looking at wrt Shindig. For example, there is a dynamic-height feature gadgets can declare. This gives the gadget developer an API to say “I’ve updated, now please change my height”. Well, an iframe can’t change its own height; it must tell its parent to do that. And since the gadget lives in an iframe, on a different domain as the container (e.g. iGoogle), this requires a cross-domain, cross-iframe, message. And so, it uses this technique (“rpc” – remote procedural call – in shindig terminology) to pass a message to the container.

Cross-Domain Browser-to-Server Calls

The best known technique for calls from the browser to an third-party server is On-Demand Javascript, aka Javascript APIs aka JSON/JSONP. This was obscure in 2005, with Delicious being the best example. Now, it’s big time in Web 2.0 API land, and Yahoo! has exposed almost all of its APIs this way, and Google also provides data such as RSS content via JSON.

It works by spawning a new script element programmatically, an element pointing to an external Javascript URL. Since there’s no restriction on third-party Javascript running, the browser will faithfully fetch and execute the script, and so the script will typically be written to “return” by updating variable and/or calling an event handler.

There are two major security issues with On-Demand Javascript. Firstly, you have to trust the API provider (e.g. Yahoo!) a lot because you are letting them run a script on your own web page. And there’s no way to sanitise it, due to the script tag mechanism involved. If they are malicious or downright stoopid, your users may end up running an evil script which could ask for their password, send their data somewhere, or destroy their data altogether. That’s because whatever your web app can do, so can the third party’s script, even if you’re only trying to get a simple value back. The mechanism forces you to hand over the keys to the Ferrari when all you want is a new bumper sticker. Secondly, what’s to stop other websites also making use of the external Javascript? If your own site can embed the script to call a third-party JS API, so too can a malicious fourth-party. This is fine for public, read-only, data, but what if you’re relying on the user’s cookies to make privileged calls to the third-party? Then the fourth-party’s web app will be just as capable of issuing calls from the browser to the third-party, and they might well be more evil calls than you’re making, e.g. “transferFunds()” instead of “getBankBalance()”. The moral is: Javascript APIs can only be used for serving public data.

Whoa!!! Public data only? That’s a tragic restriction on our cross-domain API! For mashups to be truly useful, it must be personal. We’ll increasingly have OAuth-style APIs where users will tell Site X that Site Y is allowed to read its data. But how can that work in the browser? How can Site Y expose its data so that it’s usable from the browser, but only when Site X is running? It can’t work with On-Demand Javascript. Site Y could try reading the referrer headers to see where the call is coming from, but anyone could write a command-line client with fake headers.

In fact, the answer is to use the iframe hack described in this article. As I mentioned earlier, this is how Facebook gets the job done, with what is essentially the same “power of attorney” delegation model as OAuth (BTW thanks to my colleague Jeremy Ruston for the “power of attorney” OAuth analogy – albeit it was stated in a slightly different context from OAuth).

I haven’t looked too much into the mechanism involved with the Facebook API, but it looks like it’s essentially using a variant of the Marathon technique. From memory, there’s an ever-present invisible facebook.com iframe. Each time your web app make a Facebook call, the Facebook JS library spawns a new “proxy” iframe, which passes the message on to its same-domain ever-present frame, which makes a bog-standard XHR call to Facebook. So now we’re making an XHR call to another domain, which we can get away with because it’s coming from a separate iframe. Once the XHR call returns, I think the message is returned to your application (this happens via another same-domain iframe you must host on your server, though I think that’s unnecessary) and the proxy iframe disappears.

Note that all Facebook ever exposes is a standard web service that relies on the user being logged into Facebook – there’s no Javascript involved. The user must be logged in and must have given permission for the application to access its Facebook details. Effectively, the user is allowing a particular website URL to make Facebook calls, since the application developer must register the URL. If you look back at the iframe algorithms I described earlier, you’ll see that it’s straightforward for Facebook to ensure that only this application (and any other application the user trusts) can access the data. The Facebook.com iframe (whose behaviour is controlled by IFrame and can’t be tampered) simply has to inspect the URL of the parent window and pass it to the server as part of the XHR call. The server can then check that the logged-in user has authorised this application, using the URL to identify it.

As for the first concern of cross-domain Javascript – having to trust the third-party API provider – I believe the iframe technique overcomes this concern too. Facebook.com never gets to run arbitrary Javascript on your server. Of course, you have to trust the Facebook library and the Facebook-provided iframe you’re required to host on your domain, but those could be audited prior to installation. All those things are set up to do is call callback methods inside your top-level application; you could inspect the library and ensure that’s all that will ever happen.

Thus, cross-domain iframe-based communication solves both problems which have plagued On-Demand Javascript. It is slightly more complicated, however.

Conclusions

Yes, this is a somewhat complicated technique. Actually understanding the problem it solves is really the hard part! Once you understand that, and once you understand those laws of physics, the trick is actually quite straightforward (either version of it).

The technique will be critical for gadget containers such as Shindig. As OAuth takes off, we’ll also see the technique used a lot more in mainstream applications and APIs.

With HTML 5, cross-frame messaging will render the hack unnecessary for iframe-to-iframe communication. Indeed, the aforementioned Cross-Domain library uses that technique already for Opera, in a fortuitous twist of fate since Opera doesn’t actually support everything this hack requires. However, the notion of using iframes for cross-domain calls will still be present, no matter how the windows talk to each other.

Shindig Architecture: Java gadget Server 3 – Util

More raw Shindig notes. This time, looking at org.apache.shindig.util. See Shindigging tag. This is just a quickie for completeness sake as it’s a few generic util classes. This post completes the listing of all Java classes in the Shindig architecture at this time.

Check – Runs some standard assertions (empty null etc).

InputStreamConsumer – Input stream -> String

ResourceLoader – Loads some files within a path. Will trawl through path and also open up any JARs.

Shindig Architecture: Java Gadget Server 2 – Servlets

More raw Shindig notes. This time, looking at org.apache.shindig.gadgets.http. See Shindigging tag. I’ll structure them just a little more this time.

Main Servlet

BasicHttpContext.java – data struct for country/language/locale

GadgetRenderingServlet.java – The servlet that accepts gadget spec URL and prefs, and outputs the gadget content (typically in an iframe). Delegates heavily to GadgetServer, in order to get a Gadget, and then serialises the Gadget itself with outputGadget(). outputGadget() will output the gadget as either URL or HTML type, depending on the content type. (I expect those output methods will probably be extracted to a seperate class, or to their own strategy classes.)

  1. gadget = servletState.getGadgetServer().processGadget(gadgetId,
  2.           getPrefsFromRequest(req), context.getLocale(),
  3.           RenderingContext.GADGET, options);
  4.       outputGadget(gadget, view, options, contentFilters, resp);

HttpProcessingOptions.java extends ProcessingOptions – Allows URL params to override default options, e.g. to allow caller to suppress caching

Javascript Servlet

JsServlet.java – Outputs Javascript content

Proxy Servlet

ProxyHandler.java – Provides implementation for ProxyServlet, which is a thin wrapper around this class

ProxyServlet.java- Handles Fetch commands for gadgets, i.e. allowing them to get remote content. Delegates everything to ProxyHandler

RpcServlet

RpcServlet is a “meta” servlet. Initially I thought it was just for debugging/administering the container, but it plays a more important role as it lets the browser-side gadget container issue a query to find out about the gadgets it’s hosting. (The gadgets are of course in an iframe, so due to security restrictions, it can’t directly inspect the gadget content to find out, for example, its name, which it needs to know in order to show the gadget chrome/wrapper).

JsonRpcContext.java – Context for JsonRpc stuff. Used by RpcServlet

JsonRpcGadget.java – Meta-model of a Gadget (ie just its defining features – URL and moduleId – and not fields to populate it as in Gadget). Used by RpcServlet

JsonRpcGadgetJob.java – Used by RpcServlet

JsonRpcProcessingOptions.java – Used by RpcServlet

JsonRpcRequest.java – Used by RpcServlet

RpcException.java – Boring exception class

RpcServlet.java – Provides Gadget meta-info – allows a programmer/tester to get info about the gadget server and list its gadgets. See http://www.mail-archive.com/[email protected]/msg00317.html.

Used by All Servlets

CrossServletState.java – Servlet scoped state (ie instances of the same servlet always get this state object) - – Defines accessors for globals such as the GadgetServer, so that each Gadget can get a handle on them.

DefaultCrossServletState.java implements CrossServerState – creates globals such as the GadgetServer (and defines accessors for servlets to access them). Also includes some utility methods for the servlet (which could really go elsewhere).

Misc

CajaContentFilter.java implements ContentFilter – Caja filter – adaptor/bridge to Caja project, which sanitises JS, intended for inlined gadget.

Shindig Architecture: Java Gadget Classes

This is the first of an open series on the architecture of Shindig, the new open-source gadget/widget framework project. As mentioned here earlier, this project is building something similar to iGoogle, i.e. an environment for serving gadgets, a run-time environment for the gadgets to operate in, and a gadget container (as well as OpenSocial support).

I’m currently digging into Shindig’s architecture and will document my progress.

For the record, there’s not much discussion of Shindig’s architecture to date. The most useful summaries I’ve seen are a couple of notes on the mailing list:

  • http://mail-archives.apache.org/mod_mbox/incubator-shindig-dev/200801.mbox/%[email protected]%3E
  • http://www.mail-archive.com/[email protected]/msg00369.html
  • http://trac.hyves-api.nl/hyves-api/wiki/ShindigStarted

Also, be aware that Shindig has server-side implementations in both Java and PHP, and potentially more languages in the future. I’m focusing on Java at this time.

I’ll be tagging each of these articles with “shindigging” (as well as “shindig”, a general tag for anything on this blog about shindig). Thus, you’ll be able to find a full list of articles from http://softwareas.com/tag/shindigging.

Java Gadget Server

I’ve walked through each file in the Java gadget server, in the main package – org.apache.shindig.gadgets and taken a very raw set of notes on each file / public class, as well as sketched a quick summary of the process. I’ll refine all this later.

Java Gadget Server – Tracing from gadget spec to page content

A gadget server takes an XML file on a server somewhere and converts it to some HTML/JS/etc content inside an iframe. After looking at org.apache.shindig.gadgets, the Java gadget server achieves this task as follows.

  • GadgetServer is invoked from the web app to render a gadget whose spec sits at a URL
  • GadgetServer uses CacheLoadTask to load the _Gadget_ object if possible
  • If not found, GadgetServer uses SpecLoadTask, which uses RemoteContentFetcher, to grab the Spec.
  • GadgetSpecParser converts the XML string into a GadgetSpec, which is a Java representation of the XML spec.
  • Gadget constructs itself from a combination of the GadgetSpec and the preferences.
  • GadgetServer passes Gadget to each required GadgetFeature (going by the required features declared in the spec). These GadgetFeature objects perform some kind of transformation on the Gadget – typically they add one of more JS libs to it (a gadget has a list of JS libs).
  • At this point, classes in the http package kick in to render the Gadget object, of which more in a different blog post.

Java Gadget Server – Files / Classes in org.apache.shindig.gadgets (raw notes)

BasicGadgetBlacklist.java – [part of GadgetServerConfig] dumb implementation of GadgetBlacklist – file based

BasicGadgetDataCache.java – dumb implementation of GadgetDataCache – Just a hashmap

BasicGadgetSigner.java – dumb implementation of GadgetSigner “Provides dummmy data to satisfy tests and API calls”

BasicGadgetToken.java – dumb (String) implementation of GadgetToken

BasicRemoteContentFetcher.java – server-side remoting proxy

BidiSubstituter.java implements GadgetFeatureFactory – Bidirectional language support (i18n). Performs “hangman” substitutions (MSG_foo). Builds up a Substitutions and executes it.

Gadget.java – It’s a gadget! This object is created from a GadgetSpec and ultimately serialised to a string representing the HTML/JS/etc content that sits on the page. Prior to serialisation, the object is subject to a set of transformations, one for each GadgetFeature it requires.

GadgetBlacklist.java interface – [part of GadgetServerConfig] persists blacklist and lets you query if a given URL is blacklist

GadgetContentFilter.java interface – String->String filter interface to transform the HTML/JS/etc widget content for the browser, e.g. for Caja sanitisation

GadgetContext.java – This object is passed to each GadgetFeature in the processing sequence to tell it what’s going on and help modify its behaviour, since it contains info about gadget server options – ProcessingOptions – as well as Locale, RenderingContext and ServerConfig.

GadgetDataCache.java – [part of GadgetServerConfig] Cache interface. Simply a map from string ID -> Type T.

GadgetException.java – Exception base class

GadgetFeature.java – Transforms a Gadget so it will implement a particular feature. prepare() on initial call and void process(Gadget) later on. TODO more

GadgetFeatureFactory.java – Simply an interface to create Gadgets “GadgetFeature create()”

GadgetFeatureRegistry.java – [part of GadgetServerConfig] A map of gadget features in this gadget server. Essentially Gadget ID string -> {feature object, other features it depends on}

GadgetServer.java

  • Includes processGadget(), which is called by gadget servlet. GadgetID [ie gadget URL] -> Gadget object ready for rendering
  • processGadget() adds a sequence of task objects (commands) and executes them:
    • CacheLoadTask – load gadget from cache instead of fetching/constructing it
    • SpecLoadTask – load gadget from remote URL (using low-level class, RemoteContentFetcher)
    • EnqueueFeaturesTask – popalate Gadget’s list of required gadget feature objects
  • Uses a workflow process: Works iteratively – each cycle, it works out which tasks need to be performed. Keeps iterating until all tasks completed or no new tasks can be added. Meanwhile, accumulates all gadget exceptions for all iterations so they can be bundled together in a big exception option that’s thrown if any exceptions occurred. [Note: I'm not sure why this complicated workflow algorithm is required, when afaict only 3 task objects are present. Maybe more will be added later on.]

GadgetServerConfig.java Configuration options for the gadget server. Composed of java.util.concurrent.Executor, FeatureRegistry, GadgetDataCache, MessageBundleCache, RemoteContentFetcher, GadgetBlacklist, SyndicatorConfig

GadgetServerConfigReader.java Nothing much right now. You’d think it parses a config file or something, but it just ~replicates GadgetServerConfig

GadgetSigner.java interface – defines interface for mapping token ID string -> GadgetToken

GadgetSpec.java – Dumb data structure encapsulating the spec (xml) ie user prefs, required features, gadget URI, HTML content data, random info-garbles (author etc.)

GadgetSpecParser.java – String xml -> GadgetSpec.

  1. GadgetSpecParser specParser = new GadgetSpecParser();
  2.       GadgetSpec spec = specParser.parse(gadgetId, xml.getResponseAsString());
  3.       wc.gadget = new Gadget(gadgetId, spec, prefs);
  4.       (ie xml file becomes spec, spec becomes gadget)

GadgetToken.java – Effectively a token string, with a method to sign URLs

GadgetView.java interface – An immutable view of the gadget

JsFeatureLoader.java – Goes into a directory and recursively finds all files matching “feature.xml” Reads each file into a GadgetFeatureRegistry.Entry and registers it into registry (e.g. feature.containerJs.add(JsLibrary) (remember a GadgetFeature modifies the gadget in some way. In the case of a JsFeature (defined in JsLibraryFeatureFactory), the modification is simply to add some JS libraries)

JsLibrary.java [jsLibraries is part of Gadget] – Represents a JS library – holds its source u.g. URL/file) and capable of reading it to get a string of the JS. The source may be a string representing the JS itself, which is useful if the client simply wants to construct the script text programatically.

JsLibraryFeatureFactory.java implements GadgetFeatureFactory – Provides GadgetFeatures in the case where the gadget feature is simply a JS file (or a list of container JS files and a list of gadget JS files). In this case, the feature’s process() method is simply to add all the libraries to the gadget (gadget.addJsLibrary). JsFeatureLoader uses this after trawling through to find the feature.xml for each gadget, since that file simply identifies a bunch of JS libraries.

MessageBundle.java [part of GadgetServerConfig] String ID -> Message map.

MessageBundleParser.java XML file -> MessageBundle

MessageBundleSubstituter.java implements GadgetFeatureFactory – Provides MessageBundleSubstituterFeature. This feature is a Javascript library that “compiles” the MessageBundle to Javascript, for a particular locale. It sets up language and country preference (String setLangFmt = "gadgets.prefs_.setLanguage(%d, "%s");"; String setCountryFmt = "gadgets.prefs_.setCountry(%d, "%s");";), and then sets up, for each message, the JS mapping from ID -> Message ( String setMsgFmt = "gadgets.prefs_.setMsg(%d, %s);" );

ModuleSubstituter.java – Includes ModuleSubstituterFeature which simply replaces MODULE hangman string with the module ID.

OpenSocialFeatureFactory.java – Provides OpenSocialFeature

ProcessingOptions.java – Tweaks GadgetServer.processGadget algorithm (methinks this seems like a weird pattern – should instead be attributes of GadgetServer).

RemoteContent.java – Encapsulates results of HTTP call – the content as well as status code, size, etc.

RemoteContentFetcher.java [part of GadgetServerConfig] – HTTP client to grab gadget spec (nb IMO too much BDUF abstraction going on here)

RemoteContentRequest.java – Encapsulates request for HTTP call – headers etc.

RenderingContext.java – enum { GADGET | CONTAINER }

SpecParserException.java – boring exception class

Substitutions.java [part of Gadget] – A collection of Substitutions – each Gadget has a Substitutions object, which it uses for get() queries, e.g. “public String getTitle() { return substitutions.substitute(baseSpec.getTitle()); }”. * Several substitution types MSG BIDI UP(user-prefs) MODULE * A map for each substitution type, mapping substitution key -> substitution string * Runs the sequence of substitutions on a given string

SyndicatorConfig.java [part of GadgetServerConfig] Unclear – related to OpenSocial and JSON.

UserPrefSubstituter.java [part of Gadget] – Builds up JSON object with preference values, using Substitutions to perform any substitutions (???)

UserPrefs.java – preference ID -> string (value of preference)

Dual-Side Templating

Ajax, Ajax Patterns, Javascript, Server-Side Javascript

As server-side Javascript continues to gather momentum, patterns will start to emerge. Dual-side templating, which I’ll explain below, is a pattern I’ve been harping on about for a while because you can kinda sorta use it already with a product like Rails. It will be a lot more powerful with OFL (our favourite language) on both sides of the wire.

The timeline looks like this (with milestone times neatly accelerating towards the singularity :):

  • c. 1995: Server-Side Templating. This is the standard templating used in Java’s JSP, Perl’s Mason, PHP, ASP, etc. ie some html code with <?= “language” ?> code embedded in it.
  • c. 2005: Browser-Side Templating. This is an Ajax pattern where you have a block of HTML that includes some custom syntax (e.g. <% ${foo.bar} %>) which are then processed via Javascript.
  • c. 2010: Dual-Side Templating A single template is used on both browser and server, to render content wherever it’s appropriate – typically the server as the page loads and the browser as the app progresses. For example, blog comments. You output all existing comments from the server, using your server-side template. Then, when the user makes a new comment, you render a preview of it – and the final version – using browser-side templating.

I continue to be bullish on server-side Javascript and am expecting a lot of design patterns to emerge in the next couple years. AppJet and Jaxer are already available, but the real impact will be (a) enterprise-friendly stack, probably Java-based; (b) commodity hosting stack, probably Jaxer based.

BlingText and Banner

Ajax, AjaxPatterns

As foretweeted last week, I created a little Ajax app called BlingText.

As you can see, it takes a message and provides some ASCII renderings. In particular, it includes a port of the old UNIX/C Banner utility.

If I do more work on it, the main improvements will be:

  • Options. Let the user specify, for each transformation, parameters such as the fill character (“*”) and amount of spacing.
  • Better OO (internal change). Each of the transformations is at present a terse strategy object, which is good. However, there’s no inheritance going on, so it could be better.

Where Do Widgets Come From? A Look at Widget/Gadget Content Types

Ajax, AjaxPatterns, Gadgets, Google, Web, Web 2.0, Widgets

Background

A while back, I walked through a Google Gadget I made called Digg Roundup, which simply shows Digg headlines and can be customised on topic and popularity. In my quest for an uber-simple tutorial, one thing I skipped on was content type, the subject of the present muttering. There are several content types possible, each with distinct implications for the page architecture and where the gadget sits within it. Below I’ll explain the options and help you understand how to decide between them.

A gadget is always expressed as some XML sitting at a URL somewhere on the net. When a developer uploads a gadget to Google, all they do is indicate their server location where the gadget is hosted. So a user’s iGoogle homepage is really just a bag of URLs (with layout and preferences for each). Anyways, the gadget is really a mini web page, and inside the gadget XML is a content tag that describes the HTML, CSS, and Javascript that makes up this page. The tag has a type attribute (i.e. <content type="xxx">) and there are three types available…

Three Content Types

html (<content type=”html”>)

All the page content is included in the XML file. It’s just like your standard web page HTML; it contains the (initial) HTML body from top to tail (sans enclosing <body> </body> tags), and CSS and Javascript, which can be inlined or linked, just like a normal web page. The content will be served in an iframe and Google will wrap its own html content around it. In particular, Google will look at other info in your XML file and output html code at the top, such as pulling in libraries and setting up preference data. Incidentally, Google won’t add any visual wrap around the widget at this stage; that is the responsibility of the container, which could be iGoogle, Ning, or any standalone web page that chooses to render the gadget.

Although the XML is held on your own server, the iframe will ultimately be served from a Google subdomain as its source. e.g. gmodules.com/blahblah. If you look at iGoogle, you’ll see the source is actually something like 50.gmodules.com/blahblah. The point of the “50″ is that each gadget on a page comes from a different subdomain, thus isolating gadgets on the same page from each other for security reasons (there are ways around that restriction for consenting gadgets).

The main differences between a <htm> gadget and a normal web page is that your content must be static – the XML file cannot be generated on the fly by your server, because Google will cache it. Think of this kind of gadget as a single file that describes your gadget in isolation, which you could mail around or stick on a USB key. The dynamic behaviour comes from the fact that you can link to external CSS and Javascript (which could be generated on the fly) and, moreover, you will typically be making remote calls once the application is running (either on startup, due to a timer, or in response to system events).

url (<content type=”url” href=”myserver.example.com/…”>)

url. This could alternatively be called “external” or “hosted”. You simply provide a URL on your own web server and deliver up the gadget content from there. Specify “example.com” and the user will see an iframe pointing to “example.com”. The content for url gadgets must be dynamically generated, the exact opposite of html gadgets, which are always static. This is due to the way these gadgets use

html-inline.

This is no longer supported, but it’s worth knowing because (a) the contrasting model helps you understand the other models; (b) older gadgets, as well as Google’s own gadgets, are still able to use this model; (c) Google says they might allow special cases through in the future; (d) other containers do support this model (NetVibes anyway). In contrast to the previous types, inlined gadgets are not contained in an iframe; they are served as plain old HTML embedded into the fabric of the page. Just some content in a div.

A Note on Preferences

Persistent preferences are a key feature of widgets. Just a quick note that preferences work slightly differently on html vs url widgets (and inline, but I didn’t look into it).

With html widgets, the preferences are set as part of the HTML wrap around your widget’s HTML. ie some Javascript variables are set up.

With url widgets, the iframe source URL includes some CGI parameters expressing the preferences (e.g. up_storyAmount=5).

Either way, though, Google’s preference library abstracts this detail. As long as you declare a dependency on the library, it will be included and will let you access preferences the same way.

Decisions, Decisions: Which content type to use?

So that’s the definitions. Let’s look at the capabilities of each and understand how we might make decisions on which model to use.

Inline or IFrame (content-type=html/url)?

Firstly, inlined (html-inline) versus iframed (html and url), assuming inlined is an option for you. Living in the same DOM means that inlined gadgets can talk to each other; dead simple cross-gadget communication is a key benefit of the inlined model. Furthermore, gadgets and the main portal page can talk to each other. You can change the Google logo, for example, or put up a lightbox effect across the entire page.

Not too surprising that raw inlined gadgets were banned; the security risks are high. e.g. An inlined gadget that asks for your gmail credentials would end up storing username and password in global page space, which means a second gadget could simply read that data and upload it somewhere. You could also screw with iGoogle’s branding, which Google PR probably didn’t appreciate and could lead to phishing attacks. Google could rely on code reviews to minimise the risk of inlined gadgets, but that’s a very manual, unscaleable, approach, and users can in fact add gadgets from any URL (although they made it a lot harder to do that a few months ago; AFAICT you need to use the developer gadget in order to add a gadget that’s not in the catalogue). Maybe inlined gadgets will come back if a product like Caja proves itself worthy of automatically sanitising web apps. It’s a hard problem, but there are some good benefits to be had if widgets can live safely together on the same page and domain.

In summary, choose inlined only if you need the extra functionality available and only if you have assessed the risks of tampering from third-party widgets.

IFrame from widget server (content-type=html) vs IFrame from your own server (content-type=url)

Second and really the key decision you have to make with Google Gadgets today, html versus url. Reason to choose “html” content type over “url” content type:

  • Javascript API. You can use the full Gadget Javascript API with no effort – you don’t even have to include a script tag because iGoogle injects it for you. With the “url” type, using the Gadget Javascript API is a hassle – you have to dynamically generate the script tag, which is not only a bit of scripting effort, but forces you to generate your iframe from a script. Furthermore, cross-domain restriction means that even if you do pull in the API, you can’t use the remoting support (the _IG_Fetch* functions for proxying and caching).
  • Cross-gadget communication. You can run cross-gadget communication using the Pub-Sub framework.
  • Resources. Google serves it for you, which saves bandwidth and maintenance costs. In fact, you will have zero bandwidth costs if your widget functionality is either self-contained (e.g. a calculator) or only hits third-party services (as opposed to your own servers’ services).
  • Popularity. Pure speculation, but you’re probably more likely to be featured in Google’s gadget gallery as they can inspect everything about your application’s workings (except for any remote calls back to your server).
  • Inline-like. If inlined widgets do make a comback, you’ll have an easier migration path.

Reasons to choose “url” content type over “html” content type:

  • Full page refreshes. Because it’s coming from an external iframe, you can do traditional, non-Ajax, programming. ie. User clicks on link to see a new page. User submits a form. Page auto-refreshes. With this type of widget, you could write a perfectly working gadget without knowing a button of Javascript.
  • Personalisation and Security. You can make XHR calls from the widget directly to your own server and make use of any cookies that have been set up. e.g. If the user has already logged into your main web app, then you widgets will be able to present personalised data. With “html” widgets, you would only be able to make use of cookies using cross-domain JSON, and cross-domain JSON is unsafe. You could possibly do it more safely using the cross-domain iframe fragment identifier hackFacebook’s new JS lib works that way – but I haven’t investigated it.
  • You can easily host the widget as a standalone web page (though it’s largely pointless and will look silly unless you pull some CSS madness).

Want to Learn More?

Taking Browser Tabs Seriously

I’ve just updated my favicon library, which I first wrote about here. I’ll explain more about the update in a separate post. For now, I want to talk about browser tabs.

Browser tabs were introduced by Opera. Then Firefox adopted them a few years later, as did Safari. Then Microsoft stepped into the ’90s with their own IE tabs. Meanwhile, tabs became teh coolness and Kevin and Alex joked on Diggnation about how you could get brownie points by saying it’s a tabbed interface. And so you get tabbed terminals among other things, and fortunately there is some consistency on keyboard shortcuts (typically ctrl-t to make a new tab and ctrl-w to close it, or option-t/w on mac).

We’ve outgrown the rudimentary functionality that is available for managing tabs.

The browser is the new operating system, the tab is the new system process, the tab bar is the new taskbar.

Power users struggle to keep up with 20+ browser tabs and grasp what’s inside them. The Firefox Tab Mix extension is a superb addition and should be part of the core. But there is a lot more that could be done, for instance:

  • Notifications. The whole issue of attention and notifications needs re-thinking in light of the new world of rich web apps. Quintessential example is web chat – how do you inform the user someone has sent a message, in another window? The favicon library helps here, and the update in my next post, helps a bit more. Playing a sound is also possible. Still, I would like to see API support for ambient dialogs, like Growl/Snarl and the Windows “sunrise” notfier that emerges from the taskbar (what’s it called officially?). And sound. It’s 2008, why can’t browsers issue a single beep like a good 1970s PC, without requiring flash or unreliable hacks!!! Speaking of sound …
  • Where’s that sound coming from? There’s a sound in my browser, but I don’t know where! Tabs should provide a visual indication if a sound is emerging from them.
  • Default/Custom Favicons. If a site doesn’t have a favicon, browsers show nothing. Bzzzt!!! They should provide more sensible defaults, e.g. at least show the site’s background colour or a thumbnail of the first image. Something to make them all different from each other.
  • Provide a Summary List. Like clicking on Ctrl-Alt-Delete in Windows to get a task list or “ps” in Unix. You’d be able to see how long each tab has been open, memory usage, other excitement.
  • Hover info Similar to the previous point, let web developers provide tooltip info which will be displayed as the user hovers over the tab.
  • Popup menus Why even open up a web page? Sometimes, you want to do something quickly without having to switch tabs. Let web developers create site-specific popup menus that emerge from the tab. For example, you could use this mechanism to record simple events as they occur. Or start and stop a timer. Or to switch channels on a music website.
  • COLOUR AND STUFF!!! Browser tabs are pretty dull – just an icon and some text. Using cues such as colour and font styles, the browser could say a lot more about what’s happening in the other tabs. Perhaps it could be set by the programmer or perhaps it could be set by the user (e.g. create a heatmap highlighting the least used tabs).
  • Javascript events. Javascript onEntered()/onExited() events to let the application know if it’s active or not. (Similar to what desktop apps receive.) This would be absolutely brilliant for when you are notifying the user about something they need to see (e.g. a new chat message) – once they re-enter the tab, you can switch off the notification.
  • Open Forms. What about when I start writing something in a form, then switch tabs, and forget which tab has the form, or forget that it’s there at all. The browser should indicate when there’s a form open that you’ve been writing to. (Though in some cases auto-backup features may mean that’s not necessary.)
  • Search. No-brainer. Browser search should work across all tabs, not just the one currently open. This would not only help you find some text, but also pinpoint one of the fifty tabs you have open.
  • Virtual Desktop. Maybe it sounds mad, but I’d like something similar to virtual desktop (“Spaces” for Mac-heads). ie Switch from “work” tabset to “social” tabset, etc.
  • Auto-remove. Instead of forcing me to close all windows, or some random subset, or restart the browser altogether, provide some support for removing the tabs that matter least. e.g. the tabs that I haven’t used for the longest and which I appear not to have interacted with (ie started editing a form), and/or the tabs that are taking up the most resources.

I’m sure there’s a lot more. The main point is to take inspiration from the way operating systems let users deal with open applications, and then some. The dynamic favicon library is a small part of the solution, but there’s only so much libraries and even browser add-ons can do…it needs to become a core feature of the browsers. Just as Opera and then Firefox owe a big chunk of their initial popularity to their the cult of the tab, so too do the manufacturers today have a similar opportunity to take it to the next level.

Firebug Wishlist

Just mailed Joe Hewitt a couple of suggestions for the bug.

Hi Joe,

A couple of firebug suggestions.

(1) Search

On the Ajaxian interview, you mentioned people have trouble locating and knowing about the existence of Search. (Me too.) Redesigning the layout will help, but I'd also suggest retaining the standard alt/control-F shortcut somehow, as I'm sure a lot of your power user audience will be trying to use that and failing.

More generally, any redesign probably ought to somehow integrate Firebug  search into Firefox search, e.g. use keyboard focus to decide which of the two is searched.

Bonus points if Firebug search cycles through all tabs :) .

(2) One word: cookies!

Cheers,
Michael

Ajax Patterns Lookalike

So there’s this Japanese R&D blog that focuses on covers and spotted an uncanny resemblance.

Ajax本が一杯出てきているけど、トウトウこんな本まで登場です。

Ajax Design Patterns
http://www.oreilly.com/catalog/ajaxdp/index.html

Ajax Design Patterns

  • 作者: Michael Mahemoff
  • 出版社/メーカー: Oreilly & Associates Inc
  • 発売日: 2006/07
  • メディア: ペーパーバック

カバー表紙が「バクダッド・カフェ」風味。ちょっと惹かれる。

 

紀伊国屋、DVD「バクダッド・カフェ 完全版」を4月25日に発売
―日本初公開版に、カットシーンを追加した完全版で登場
http://www.watch.impress.co.jp/av/docs/20030204/kino.htm

少しは読もうかな?

Bagdad Cafe is an arthouse that was always in the video library but I never watched. The film runs 95 minutes in the U.S. and 108 minutes in the German version). It is a somewhat surreal comedy set in a down-at-heel truck-stop café and motel in the Mojave Desert. An ill-assorted cast of characters are assembled, including a plump German tourist (Sägebrecht as Jasmin) who has left her husband after a row in the middle of the desert, the short-tempered owner of the café (Pounder as Brenda) who has just thrown her husband out, Brenda’s two children and grandchild, a strange ex-Hollywood set-painter (Palance), and a glamorous tattoo artist (Kaufmann). Through a passion for cleaning and for magic tricks, Jasmin transforms the café and all the people in it.