As We May Think: URL Trails

As We May Think, Vannevar Bush’s legendary paper presaging hypertext, wikis, and the web, penned in 1945. Many features of his ficticious “memex” device have come to fruition on the web, but one thing I always find lacking is his concept of trails:

When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item. The user taps a single key, and the items are permanently joined. In each code space appears the code word. Out of view, but also in the code space, is inserted a set of dots for photocell viewing; and on each item these dots by their positions designate the index number of the other item.

Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. It is more than this, for any item can be joined into numerous trails.

The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. Specifically he is studying why the short Turkish bow was apparently superior to the English long bow in the skirmishes of the Crusades. He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him.

And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outraged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. It is an interesting trail, pertinent to the discussion. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

Now at one level, any web page is a trail. Thanks to the magic of hypertext, any website is effectively a view onto the rest of the web, with the author being able to list links and comment on them. However, I don’t find much in the way of explicit support for this concept of trails.

Basically, I see a trail as a linear sequence through a bunch of resources, with the potential to annotate at each step. On the web, a resource is represented as a URL, so assuming everyone plays ball, building a trail is as simple as building a list of URLs, with some form of annotation for each. That’s the nice thing about URLs – it doesn’t matter if it’s a written article, a photo, or a video; they are all URLs. Furthermore, it doesn’t even have to be served via http; it could be a file on your hard drive (file:), an email adress (mail:), or an IRC channel (irc:). They can all live together in harmony in the same trail.

The closest popular thing I’ve seen to this is Amazon’s ListMania. My favourite ones are those beginning with “So you …” as in So you want to be a geologist or So you like to read about the Trojan War. These capture the spirit of Vannevar B’s tome; as a budding geologist, I might find a book or two on geology and see that list linked from the side, where I can get the (hopefully) expert opinion on the trail I can follow towards mastery.

There is also some interest around outlining, which is basically the trail concept when taken in a web concept, although you don’t see it being a big part of the web explicitly.

I was chatting to Jon last night and he refers to these as tour guides, and pointed me to The Nethernet, an unusual “game” that appears on websites you surf, based on Firefox plugins. It includes a concept of a “quest” where you can jump between websites, following a trail laid out by someone else.

All of this has been converging lately with several activities happening at Osmosoft. As a project in my spare time, I’ve been working on a tool to caption images. A trail of captioned images would be good as a slideshow. As a project in his spare time, Simon has been working on an excellent video player. The player runs through a playlist, which can be populated in various ways. Each of the videos is effectively a URL, so the playlist is effectively a trail. And in TiddlyDocs, one of our main projects at Osmosoft, a document is a hierarchy of sections, each being a tiddler sitting on a unique (TiddlyWeb-backed) URL. So the document’s table of contents is also a trail. Jeremy realised the implications of this early on, and we’ve begun talking with him about a richer format for the document spec, so it can include annotations and transitions between items. Note that it’s also a hierarchy rather than a list, but that’s fine because (a) a list is a subset of a hierarchy, so the scheme we arrive at will be able to degenerate into a list representaiton; (b) a hierarchy can still be traversed in a deterministic, linear, fashion – as you do when you read a book which is composed of sections, subsections, and so on.

So where we’re at now is deciding on the format for a trail like this. It will probably be JSON-based and be inspired by OPML at some level. Being a file format which will be baked into people’s content, it needs a little more upfront thinking than a coding exercise.

The URL Shortener as a Cloud Database

On URL Shorteners

URL shorteners are enjoying their 15 minutes of fame right now. They’ve been around since 2002, but became flavour of the month as soon as half of the planet decided to compress their messages into pithy 140-character microblogs, and there is money in it, driving a massive amount of new players into the market, which will ultimately lead to a massive amount of URL shortener induced linkrot. [Update Dec 2011 – I note that the URL shortener I used for a while, 3.ly, is now indeed linkrot :(.]

In passing, I will note the irony that long domain names were the flavour of the month a year ago. Although, maybe it’s not so ironic, since they enjoy a symbiotic relationship with the URL shorteners when you think about it.

Now, I recently realised that URL shorteners could be used as a form of cloud database. The URL is a form of data. And the interesting thing about this is that they form a cloud database that can be accessed from any Ajax app, because they (a) can be created anonymously; (b) offer JSONP APIs, in some cases (and with third-party bootleg APIs available in others); (c) allow you to store relatively long strings. Before you can say, “violation of Terms and Conditions”, I will get to that later on.

Character Limits

On (c), just how long can these URLs be? I did a little digging – gave them some huge URLs to convert using just the homepage of each service. I chose the top services from Tweetmeme’s recent study, minus friendfeed’s internal shortener, to come up with the four most popular services – tinyurl.com, bit.ly (my candidate for the first URL shortener to appear on the cover of Rolling Stone magazine, in case you ever doubted a URL shortener could be the in thing), is.gd (the one I’ve been using since it was a wee thing spouting three-character shortcuts), and tweetburner aka trurl.nl.

I was expecting them all to truncate at around 2083 characters, the traditional limit for IE. Boy, was I wrong!

I started playing around adding really long URLs, and playing a “Price Is Right” higher, higher, lower, higher game until I found out roughly the capacity of each.

Note that Bit.ly and Twurl.nl both give the impression they are storing more than their limits, i.e. they don’t show an error message, but instead they silently truncate the URL. Is.Gd does the right thing by telling you what it’s done. Although, the limits are weird – you would think they’d go for IE’s 2083 character limit, or be all binary and go for 2048, rather than cutting off at 2000. I guess 2000 is a simpler number to tell humans about.

So the most interesting one here is TinyURL. However, the actual underlying URL doesn’t work for some reason – the most characters I found that would work was 8192. However, the entire URL is stored, as you can see at the preview page.

A Legitimate, Related, Use: Shortening an Ajax Unique URL (with Fragment ID Reflecting App State)

The thought of using URL shorteners might sound crazy, useless, and a violation of terms, but it came to me for an entirely legitimate application, which is well within the T’s and C’s I believe. I’m creating a web app right now (very incomplete) where the entire state is captured in the URL. (see Unique URL. This saves me from having to set up any storage and (in some respects) makes life easier for users, who don’t to manage yet another account, log in, etc etc. It certainly lowers the barriers for new users if they don’t have to register in order to save things.

Saving the entire state in a URL can lead to a long URL. So with all the hype around URL shorteners, I figured why don’t I just let the user save it to a short URL, if they do prefer a short URL for mailing or writing down, or memorising (since some of these services let you specify the key). And so I might choose to build into the app a little “get short URL for bookmarking and tweeting” button. (Funnily enough, I would have previously called it “bookmark this”, but that would mislead users into thinking that the long URL on top isn’t actually a valid bookmark. Now that everyone understands URL shorteners, I can be more explicit about the button’s purpose.)

The short URL is effectively a holder of the entire state of this user’s application. In fact, this seems like an entirely valid reason to use a URL shortener, so I doubt it’s a violation of anyone’s terms. Worth noting incidentally that there are plenty of free images where you can anonymously upload 100K or more, so I doubt a 10K URL is a big deal; and given that the service receives link love and some useful tracking data, it’s probably just as valuable financially as an image sent to an image host.

A Pure Cloud Database

You can see where this is going. An extension to this thought is to simply treat the URL shorteners as cloud databases. As long as it looks like a valid URL, you could store whatever you like there. Turns out you can even store an image or a short MP3 as a data:// URI. I have no plans to do this, and I suppose it actually would be a violation of terms in this case, but it’s an interesting idea.

And if the URL was too long, you could always use a linked list structure —- break it up into several short URLs, with the last few characters of each source URL pointing the previous short URL. (it’s backwards since you can then be sure what URL was allocated, and you would distribute the last URL in the series).

  • http://tinyurl.com/mark3 end of the message mark2 (this is the URL you distribute)
  • http://tinyurl.com/mark2 middle of the message mark1
  • http://tinyurl.com/mark1 end of the message

There is actually prior art on this concept, I discovered – some anon poster recently created a proof-of-concept cloud DB, with encryption to boot. There were no replies to that post and it seems to have gone unnoticed, which is unfortunate. So allow me to dig it out:

In almost obvious violation of their terms of service (maybe not entirely, they technically are urls, just with random data tacked onto it.) I’ve created a way to securely store arbitrarily length data on URL shortening services like tr.im, bit.ly, tinyurl, etc.

You have to pass both the message and a key. The key is SHA-1’d and then the message is encryped with the key by AES-256. The message is split to 200 byte chunks and it loops through them. For each one, a special salt variable exists for no particular reason, is mixed with the key and a packet identifier number (part 0 = 0, part 1 = 1, so amazingly complex) and all of that is again SHA-1’d. It’s trunkated to 14 digits. The part of the data is prepended with a pseudourl. and the url is passed to the url shortener API and the 14 digit string is used as the custom short URL. The last packet is appended with a special last-packet marker.

http://jsbin.com/ixuda

As We May Think

All this makes me think what kind of JSON-based cloud services there should be on the web, that would indeed be explicitly designed for this kind of purpose and be more suited to the purpose. I bet you could build something nice along those lines with TiddlyWeb server.

The biggest restriction with all this is that the services are write-once. e.g. if you make a pretty poster, tinyurl it, and send the link, you can never change the poster someone will see when they visit that link (because the link directly represents the composition of your poster, rather than being a pointer to the composition on the server). So this heavily limits applicability of the concept anyway, but if users are willing to live with that restriction, it’s a big benefit in terms of simplicity. You could also overcome the restriction using some of the newer URL shortening services that let you log in and maintain your shortcuts. But that would (a) defeat the purpose of simplicity; (b) defeat the purpose of working in Ajax apps, since it would require privileged JSON calls, and privileged JSON calls are wildly insecure.