Including HTML Files from Other HTML Files, on the File:// System


“I wont be sending an officer because your not in any danger at all. You have obviously just put a blanket on a dog while it is sitting in your car and taken a photo. “

I still have a passion for web apps that run on the file system. It’s an extremely easy development model and extremely flexible. You can send a file (or set of files) to anyone and be confident they can open the files and run your web app, regardless of their operating system and without imposing on them the requirement of setting up a server. Furthermore, they can stick it on a share drive and BAM, guerilla multi-user system. I’ve had the habit long before I developed for TiddlyWiki but my time with TiddlyWiki focused my attention on the benefits and taught me a number of Single-Page App (SPA) hacks which most web developers are still oblivious to.

And let the SPA hacks roll on …

As I start to think about resetting the slideshow framework I’ve been randomly sniping at conferencesrecently, one thing I’d like to do is the idea of a file per Master Slide, containing all of the HTML, JavaScript, and CSS. This is more or less how TiddlyWiki themes work, and a very neat modularisation tactic.

Unfortunately, HTML – bless it – can include JavaScript (<script src="something.js">) and CSS (<link href="something.css"> etc), but not HTML (which would look something like <div src="something.html"> in my dreams). So what are the options for pulling in one HTML file from another HTML file:

  • Server-side includes: We’ve long had server-side includes. I powered my homepage from this less-than-stellar technique for modularisation around 15 years ago. The problem is none too hard to derive from their name. Server, I don’t want one.
  • XMLHttpRequest: We could make a XHR call and actually this is possible from file to file. Unfortunately, Google Chrome (and maybe others?) sees each file as belonging to a separate domain, making it impossible, and other browsers may issue a warning or confirmation, making it obtrusive.
  • File APIs: Again, we could use the magic of $.twFile to read the other file. But this relies on browser-specific hacks and they have to be degraded to a separate Java applet, which requires a proper Java installation, in the case of Chrome, Safari, Opera, and others. Firefox uses Moz-specific API and IE uses ActiveX, which are good but also incur warnings and may be blocked by firewalls. Still, it’s not a bad solution. The extra Java applet is a big downside in TiddlyWiki, because you suddenly need to send around two files instead of one, but here I’m already assuming there’s a bundle of files to be sent around.
  • Outputting HTML inside JavaScript: Since we can read Javascript, we could just spit out the HTML from JavaScript. The benefit here is it works, and works for the most ancient of browsers. But it would require a lot of string manipulation, which would look minging and be unmaintainable, and I massively value elegant code (or at least, the possibility of it). Many times I have wished JavaScript supported Here Docs, but alas, it doesn’t :(. The best you get is a long sequence of lines ending in . Unacceptable. You can also achieve this kind of thing with E4X, but that’s not widely supported.
  • Hiding HTML in JavaScript or CSS: I’ve considered tricks like embedding the entire HTML inside a JavaScript or CSS comment, but the problem is the same reason we need JSONP; when you source a JS or CSS file, your app feels the effects of it, but your code doesn’t get to see the source. I’m still holding a candle for the possibility of some CSS hack, like based on computed style, which would let you trick the browser into thinking the background colour of a button is an entire HTML document or something…which would be worth doing just for the sake of being insanely ace.
  • Or. iFrames.

Thinking it through, I decided iFrames are your friend. You embed the file to be included as a (hidden) child iFrame. This can work in a couple of ways.

The parent could read the DOM directly:

javascript

  1. var dom = document.querySelector("iframe").contentWindow.document;
  2.     document.querySelector("#messageCopy").innerHTML = dom.querySelector("#message").innerHTML;
(The child contains message element, the parent contains messageCopy.)

This works on Firefox, but not Chrome, because Chrome sees each file as belonging on a separate domain (as I said above, wrt XHR). So we need to make a cross-domain call. We could be AWESOME and use the under-loved Cross-Origin Resource Sharing (CORS) capability to make cross-domain XHR calls, but in this case, it doesn’t work because it involves HTTP headers, and we’re doing this with pure files.

The solution, then, is another kind of iFrame technique: Cross-domain iFrames. It’s been possible to do cross-domain iFrame communication for a while, but fortunately, modern browsers provide an explicit “HTML5” API for cross-domain iframe communication. I tested it in Chrome, and it works. On Files. Yay.

Under this paradigm, “index.html” contains:

  1. <script>
  2.   window.onload = function() {
  3.     window.addEventListener("message", function(e) {
  4.       document.querySelector("#messageCopy").innerHTML = e.data;
  5.     }, false);
  6.    document.querySelector("iframe").contentWindow.postMessage(null, "*");
  7.   };
  8. </script>
  9. <h1>Test parent</h1>
  10. <div id="messageCopy"></div>
  11. <iframe src="included.html"></iframe>

while “included.html” contains:

  1. <script>
  2.   window.addEventListener("message", function(e) {
  3.     e.source.postMessage(document.getElementById("message").innerHTML, "*");
  4.   }, false);
  5. </script>
  6. <div id="message">This is the message</div>

Point your spiffy HTML5 browser to index.html and watch in glee as the message gets copied from included to includer. I wasn’t sure it would work, because certain other things – like Geolocation and Workers – simply don’t work in all browsers against the File:// URI, even though they probably should. (Probably because the browsers keep mappings of permissions to each domain, and these systems assume the domain is served with HTTP(s).)

This technique will also degrade to older browsers using those “pre-HTML5 hacks. (As the Romans used to say, Omnis enim API HTML V, aequivalet HTML V pre-furta..)

So I’m glad this technique works and intend to use it in the future, nicely abstracted with a library function or two.

Embedded Images in TiddlyWiki Under IE6 via MHTML – Proof-of-concept

tiddlywiki mhtml images (by mahemoff)

I only came across the MHTML image hack over the weekend, while listening to @jeresig on the new jQuery podcast (incidentally not the only jQuery podcast to be launched in the past week or two).

The MHTML image hack lets you embed images with text, just like ye olde data: URI hack, but in a way that works for IE6. MHTML is MIME for HTML.

Of course, I immediately wondered if it could work in a single-page web app like tiddlywiki, and it turns out it can, though my quick exercise still has some problems.

IE6 TiddlyWiki images demo here

As I wrote on the demo itself:

Normally, images are contained in a separate location, pointed at from HTML IMG tags or from CSS background-image properties. However, Tiddlywiki is a style of web app where everything resides in one file. So how to include images?

The usual hack is to embed data: URIs (http://en.wikipedia.org/wiki/Data_URI_scheme). However, no go for IE6 and IE7. Hence, a “newer” technique – newer meaning recently discovered. That is, MHTML (http://www.phpied.com/mhtml-when-you-need-data-uris-in-ie7-and-under/).

I was curious if MHTML worked in single-file HTML pages, reading off a file URI, and to my surprise it does. That said, it’s not perfect at all. Firstly, I had to hard-code the location, because I don’t know how to refer to “the current file” within the MHTML link. (I suppose a workaround would be to output the image file and refer to that with a relative URL, but we lose the benefit of everything being in one file.) Secondly, I played around with various base-64 images and this arrow one (from the phpied.com demo) was the only that worked :(.

So it’s a proof-of-concept with many gaps left for the reader to fill!

Hopefully people play with this further. At 3002 days old and counting, your grandpa’s browser isn’t going anywhere fast.

The URL Shortener as a Cloud Database

On URL Shorteners

URL shorteners are enjoying their 15 minutes of fame right now. They’ve been around since 2002, but became flavour of the month as soon as half of the planet decided to compress their messages into pithy 140-character microblogs, and there is money in it, driving a massive amount of new players into the market, which will ultimately lead to a massive amount of URL shortener induced linkrot. [Update Dec 2011 – I note that the URL shortener I used for a while, 3.ly, is now indeed linkrot :(.]

In passing, I will note the irony that long domain names were the flavour of the month a year ago. Although, maybe it’s not so ironic, since they enjoy a symbiotic relationship with the URL shorteners when you think about it.

Now, I recently realised that URL shorteners could be used as a form of cloud database. The URL is a form of data. And the interesting thing about this is that they form a cloud database that can be accessed from any Ajax app, because they (a) can be created anonymously; (b) offer JSONP APIs, in some cases (and with third-party bootleg APIs available in others); (c) allow you to store relatively long strings. Before you can say, “violation of Terms and Conditions”, I will get to that later on.

Character Limits

On (c), just how long can these URLs be? I did a little digging – gave them some huge URLs to convert using just the homepage of each service. I chose the top services from Tweetmeme’s recent study, minus friendfeed’s internal shortener, to come up with the four most popular services – tinyurl.com, bit.ly (my candidate for the first URL shortener to appear on the cover of Rolling Stone magazine, in case you ever doubted a URL shortener could be the in thing), is.gd (the one I’ve been using since it was a wee thing spouting three-character shortcuts), and tweetburner aka trurl.nl.

I was expecting them all to truncate at around 2083 characters, the traditional limit for IE. Boy, was I wrong!

I started playing around adding really long URLs, and playing a “Price Is Right” higher, higher, lower, higher game until I found out roughly the capacity of each.

Note that Bit.ly and Twurl.nl both give the impression they are storing more than their limits, i.e. they don’t show an error message, but instead they silently truncate the URL. Is.Gd does the right thing by telling you what it’s done. Although, the limits are weird – you would think they’d go for IE’s 2083 character limit, or be all binary and go for 2048, rather than cutting off at 2000. I guess 2000 is a simpler number to tell humans about.

So the most interesting one here is TinyURL. However, the actual underlying URL doesn’t work for some reason – the most characters I found that would work was 8192. However, the entire URL is stored, as you can see at the preview page.

A Legitimate, Related, Use: Shortening an Ajax Unique URL (with Fragment ID Reflecting App State)

The thought of using URL shorteners might sound crazy, useless, and a violation of terms, but it came to me for an entirely legitimate application, which is well within the T’s and C’s I believe. I’m creating a web app right now (very incomplete) where the entire state is captured in the URL. (see Unique URL. This saves me from having to set up any storage and (in some respects) makes life easier for users, who don’t to manage yet another account, log in, etc etc. It certainly lowers the barriers for new users if they don’t have to register in order to save things.

Saving the entire state in a URL can lead to a long URL. So with all the hype around URL shorteners, I figured why don’t I just let the user save it to a short URL, if they do prefer a short URL for mailing or writing down, or memorising (since some of these services let you specify the key). And so I might choose to build into the app a little “get short URL for bookmarking and tweeting” button. (Funnily enough, I would have previously called it “bookmark this”, but that would mislead users into thinking that the long URL on top isn’t actually a valid bookmark. Now that everyone understands URL shorteners, I can be more explicit about the button’s purpose.)

The short URL is effectively a holder of the entire state of this user’s application. In fact, this seems like an entirely valid reason to use a URL shortener, so I doubt it’s a violation of anyone’s terms. Worth noting incidentally that there are plenty of free images where you can anonymously upload 100K or more, so I doubt a 10K URL is a big deal; and given that the service receives link love and some useful tracking data, it’s probably just as valuable financially as an image sent to an image host.

A Pure Cloud Database

You can see where this is going. An extension to this thought is to simply treat the URL shorteners as cloud databases. As long as it looks like a valid URL, you could store whatever you like there. Turns out you can even store an image or a short MP3 as a data:// URI. I have no plans to do this, and I suppose it actually would be a violation of terms in this case, but it’s an interesting idea.

And if the URL was too long, you could always use a linked list structure —- break it up into several short URLs, with the last few characters of each source URL pointing the previous short URL. (it’s backwards since you can then be sure what URL was allocated, and you would distribute the last URL in the series).

  • http://tinyurl.com/mark3 end of the message mark2 (this is the URL you distribute)
  • http://tinyurl.com/mark2 middle of the message mark1
  • http://tinyurl.com/mark1 end of the message

There is actually prior art on this concept, I discovered – some anon poster recently created a proof-of-concept cloud DB, with encryption to boot. There were no replies to that post and it seems to have gone unnoticed, which is unfortunate. So allow me to dig it out:

In almost obvious violation of their terms of service (maybe not entirely, they technically are urls, just with random data tacked onto it.) I’ve created a way to securely store arbitrarily length data on URL shortening services like tr.im, bit.ly, tinyurl, etc.

You have to pass both the message and a key. The key is SHA-1’d and then the message is encryped with the key by AES-256. The message is split to 200 byte chunks and it loops through them. For each one, a special salt variable exists for no particular reason, is mixed with the key and a packet identifier number (part 0 = 0, part 1 = 1, so amazingly complex) and all of that is again SHA-1’d. It’s trunkated to 14 digits. The part of the data is prepended with a pseudourl. and the url is passed to the url shortener API and the 14 digit string is used as the custom short URL. The last packet is appended with a special last-packet marker.

http://jsbin.com/ixuda

As We May Think

All this makes me think what kind of JSON-based cloud services there should be on the web, that would indeed be explicitly designed for this kind of purpose and be more suited to the purpose. I bet you could build something nice along those lines with TiddlyWeb server.

The biggest restriction with all this is that the services are write-once. e.g. if you make a pretty poster, tinyurl it, and send the link, you can never change the poster someone will see when they visit that link (because the link directly represents the composition of your poster, rather than being a pointer to the composition on the server). So this heavily limits applicability of the concept anyway, but if users are willing to live with that restriction, it’s a big benefit in terms of simplicity. You could also overcome the restriction using some of the newer URL shortening services that let you log in and maintain your shortcuts. But that would (a) defeat the purpose of simplicity; (b) defeat the purpose of working in Ajax apps, since it would require privileged JSON calls, and privileged JSON calls are wildly insecure.

SVG and VML in One Chameleon File

Why a Chameleon File?

While most browsers support SVG, IE’s unique brand of interopability does not extend that far; even the latest and greatest, incarnation v. 8 of IE, has no sign of SVG. And so, we citizens of the innernets are left with two vector graphics formats: VML for IE, SVG for standards-compliant browsers, which I will simply refer to as “Firefox” for concreteness.

There are tools like Project Draw capable of rendering both SVG and VML, and there are convertors like Vector Convertor as well. You can easily end up in a situation where you have two separate files for the same image. In one’s endless quest for a simple life of zen-like existence, this is troublesome. You will have to mail the two files around or host the two files somewhere. And if someone changes one, they probably won’t bother changing the other one.

One solution would be to convert to a raster/bitmap file, i.e. GIF or PNG, which will then render fine in IE and standards-compliant browsers as well as many other places too. However, this isn’t always the best option: (a) if you want to support further editing, you will need the original vector data; (b) it won’t scale as nicely; (c) in some cases, the bitmap size is bigger by orders of magnitude.

So a colleague asked me how one could share this type of data and I got thinking about a recent experiment. I need to blog it separately, but the premise is that a single file can be different things to different people. In this case, we did some work yesterday I’ll describe here – seeing how a single file can look like VML to something that works with VML (IE) and look like SVG to something that works with SVG (Firefox et al). In the general case, I call this pattern “Chameleon File” and the particular kind of chameleon described here I call a “Vector Graphics Chameleon File”.

Demo

The demo link is below – it shows an ellipse in any browser, using either VML or SVG, whichever is supported:

http://project.mahemoff.com/vector/ellipse.xhtml

IE users are better off using http://project.mahemoff.com/vector/ellipse.html as it won’t launch automatically in IE as Windows doesn’t recognise the xhtml extension, though it will still launch once you tell windows to open it with IE. The content is the same, the URL is different. I’ll explain more below in “Limitations”.

(The example is taken from Wikipedia’s VML page.)

How?

How does it work? Let’s look at the source:

  1. <html xmlns:v="VML">
  2. <!--[if IE]>
  3. <style>v:*{behavior:url(#default#VML);position:absolute}</style>
  4. <![endif]-->
  5. <body>
  6.  <v:oval style="left:0;top:0;width:100;height:50" fillcolor="blue" stroked="f"/>
  7.   <svg xmlns="http://www.w3.org/2000/svg" width="100" height="50">
  8.     <ellipse cx="50" cy="25" rx="50" ry="25" fill="blue" stroke="none" />
  9.   </svg>
  10. </body>
  11. </html>

From Firefox’s perspective, the file is valid XHTML thanks to the .xhtml suffix, meaning that the svg tag is fair game. We use the old “if IE” comment trick to get Firefox to ignore the style rule; otherwise it will still work, but it will render the style tag’s content (this is XML, not HTML, which would have it un-rendered). It ignores the body and VML v:oval tag, and faithfully renders the SVG. In short, it does what the pure SVG does:

  1. <html xmlns:v="VML">
  2.  <style>v:*{behavior:url(#default#VML);position:absolute}</style>
  3. <body>
  4.  <v:oval style="left:0;top:0;width:100;height:50" fillcolor="blue" stroked="f"/>
  5. </body>
  6. </html>

From IE’s perspective, this is just a normal VML file with an svg tag snuck in, which thankfully for our purposes, it ignores. So IE sees the file as just regular old VML:

  1. <?xml version="1.0"?>
  2. <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
  3.  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
  4. <svg xmlns="http://www.w3.org/2000/svg" width="100" height="50">
  5.   <ellipse cx="50" cy="25" rx="50" ry="25" fill="blue" stroke="none" />
  6. </svg>

Limitations

Limitations, I have a couple of those.

File Extensions

The extension thing is really annoying (see also here). To get SVG working in Firefox, you need Firefox to see the file as XHTML. But to get VML working in IE, IE must see it as HTML. How do you make the browser “see the file” as something? You either set the MIME type in the HTTP header, or you set the file’s extension. In this case, we’re more interested in the file extension because we want to be able to just mail the file around – in which case there is no MIME type because there’s no HTTP header because there’s no HTTP because they’re viewing it off a file:/// URL – or we want to quickly stick it on a server and not bother faffing with .htaccess-fu and MIME types.

Now that being the case, what file extensions do we need? As far as I can tell, IE must have .html or .htm for the vanilla Windows operating system to open it.

As for Firefox, Firefox needs .svg or .xml or .xhtml as far as I can tell.

The problem here is there is no overlap – IE needs .html and .htm, Firefox needs .svg, .xml, .xhtml.

skitched-20090501-101849.jpg

I spent a while trying to coerce Firefox to see a .html file as XHTML using doctype and the like, but I can’t do it – any help here would be appreciated.

The consequence is that you have several possibilities: (a) call it .xhtml/.svg/.xml – it will run on Firefox and IE users will have to “open”, “choose application”, and set IE (and they can save that setting); (b) call it .html (or .htm but that’s just silly) and tell Firefox users to rename the file; (c) distribute two copies of the same file – defeats the purpose of simplicity to some extent, but since it’s still the same file, it’s not such a big deal; you can keep working with the one file until you’re ready to distribute it. Of (a) and (b), I prefer (a) because asking someone to “open with application X” is less onerous than asking someone to rename a file, which sounds a bit odd. On the other hand, in many enterprise situations, you have to optimise around IE users, in which case (b) may well be preferable. You could probably also ship the file with a little instruction to rename the file, targeted at non-IE users using CSS/Javascript hacks, which they will see on opening the file in its HTML guise.

SVG and VML Feature Sets

I haven’t experimented much with these, but I did find a larger SVG file that didn’t work properly. I’m hoping Project Draw will introduce an easy way to render both SVG and VML for the same file, which would let me script creation of SVG-VML chameleon files for experimentation; or I might just play aronud with a converter. Any data on this would be welcome.

The Chameleon File Pattern

… is rather interesting and useful. Some interesting analogies came up in discussion – one of them was the “old woman, young woman” optical illusion. I discovered it’s called a “Boring figure” after the experimental psychologist Edwin Boring. I thought “chameleon file” sounded more appropriate than “Boring file”!

Another analogy from Paul was the Rosetta Stone – same content in a different format. I think this is a good analogy, but at the same time, I guess the pattern could be used to contain different content too. That’s more the case with the JOSH Javascript-HTML Chameleon I’ll discuss later on.

It also reminds me of Zelig, an old Woody Allen flick about a “human chameleon” who tries to be all things to all people, although I think chameleon files have more of their own unique identity.

Chamelon File is a hammer, let’s go find some nails.

Thanks to my colleagues for ideas and inspiration.

Looks Good, Tastes Good

The Search Engine Experiment – a blind test where users rate relevance of results – reveals that Google is better, but not that much better. The methodology is reasonable – the only serious flaw might be if people are assuming Google is always relevant, then trying to pick the Google results. Or if people go for Google because they’re used to it, so the results are the most comfortable. For example, when I tried the test, I jumped straight for the results that included wikipedia, partly because it just felt more pure and Googlish. It turned out to be a Yahoo! result.

Anyway, taking the results at face value, how to explain MSN and Yahoo! being more relevant than the grand-daddy of search 60% of the time? Seth has a good theory:

Google is better because it feels better and quicker and leaner and easier to use. The story we tell ourselves about Google is very different, and we use it differently as a result … Music sounds better through an iPod because we think it does.

cf. Nicholas Negroponte in “Being Digital”explains he always puts on his glasses to eat steak – it tastes better that way. (BTW “Being Digital” is the greatest tech book never to make it on Joel’s MBA reading list. A real mind-opener, like Philip and Alex’s Guide to Web Design).

So Google is a cognitive dissonance machine that actually has no clothes on? Hard to believe, but bring on more of these mashup experiments.