Printing and the Web

There’s lots of talk about “web 2.0” and indeed many good things are coming on. But there also many anticipated web enhancements which are conspicuous by their absence. I’m talking about things I found bothersome in the mid-1990s, things I had assumed would be fixed around the corner. Yet, they just don’t seem to have emerged and it looks like web developers are stuck with them for a long time to come. Examples include HTML in general, the stateless nature of HTTP in general, and the shrine of inconsistent obfuscation that is javascript. Pretty fundamental stuff, huh? Anyway, my point here is printing.

We’ve gone from visions of the paperless office to printing like never before to the current situation: a decline in printing. At least that’s my guess. And presumably it’s thanks to email, PDAs, SMS messaging, bigger screens, and better collaborative tools such as document annotation and wikis. Yet, the decline in printing will be gradual and long. Paper still has many benefits: superior reading experience for many, legal relevance, safeguard for electronic storage.

So as long as we expect printing to be around for a while, let’s make printing the web a worthwhile experience. At present, there’s HTML, which is difficult to print. And there’s PDFs, which are designed for printing, but are difficult to read. I’m not going to be Mr. SaveTheWorld and propose an uber-format … I’m just going to suggest a few incremental improvements that would make life easier…

Browsers should print HTML more gracefully: * Print hyperlink URLs Since reading occurs offline, print the linked URL beside any hyperlink. Like comments on slashdot do (although, by default, they only show the domain). * Print what I can see. Many times, the browser is incapable of just printing what I can see. Thanks to funny javascript issues or attempts to reload dynamic pages, it just prints a blank page or a few lines. Not good enough. If you can display it on the screen, you can send it to the printer. * Handle frames properly. Yeah, they’re evil, but they’re a fact of HTML life, and indeed a site like bloglines shows how they can actually be very useful. All browsers should let me right-click inside a frame and allow me to print in that frame. Furthermore, when I try to print from the menu or toolbar, they should produce a thumbnail of the whole page, allowing me to visually select one or more frames for printing. (Frames are another of those that just won’t go away … frames had already been buried by the tech elite in the late 90s, who’d have thought a frame-based site like bloglines would be a hot acquisition for 2005?). * Provide better previews. Come on, you’re about to print the thing! it can’t be that hard to tell me how it will look or how many pages it will be. * Don’t crash when i’m trying to print a moderately sized file This one’s firefox-specific.

Browsers should render PDF more gracefully. Since PDF actually achieves – or bypasses the above, it could be a useful format for distribution. However, browsers just don’t handle it very well, even when Acrobat is embedded into the browser.

I have just the one big suggestion here: stop dealing with PDFs using plugins, and instead render them as HTML. Google has been converting PDFs quite effectively for about five years now, and many tools do too. I’m sure I’m not alone when I click on the Google HTML version rather than the PDF version after performing a search. If Google can convert every PDF in the universe, the browser should be able to do it for a single document.

PDFs, with their discrete pages are very difficult to browse up and down. The font size is rarely anything to do with the browser’s normal HTML size. All the browser tools you’ve come to know and love are either gone or mutated. Want to find some text in a PDF document? You’ll have to do it the Acrobat way, not the browser way. And you can forget about all your browser-specific plugins, like language translation and bookmarklets. They’d be just as useful on PDF content, but it’s not happening.

So the solution is simple: browsers should be able to treat PDFs as HTML. The Acrobat plugin can still be used for printing, so the PDF document could actually provide the best of both worlds. But for reading in a browser, HTML wins every time. And as the IText (Java PDF framework) FAQ notes, its perfectly within Adobe’s conditions to create PDF tools. In any event, if Google can put converted PDFs on the web, what’s to stop a browser from doing likewise?

Composition Is Testable, Inheritance Is Intuitive?

Ivan Moore illustrates how to replace inheritance with composition. He explains why the solution is testable, but I would argue there is a flipside: the resulting design is less intuitive. A compromise must be made between testability and intuitiveness, which suggests a technology deficiency.

Ivan’s Emphasiser example is a textbook case of two competing architectural styles. The inheritance style encapsulates word emphasis in a template method; whereas the composition style encapsulates word emphasis in a separate class. Even before mock objects, design authorities such as the Gang of Four (GoF) urged developers to consider delegation instead of inheritance. In many cases, inheritance is abused, and delegation is a more logical relationship.

However, in this example, inheritance is not abused at all. It is reasonable to have subclasses varying on word emphasis. Admittedly, the composition model is more flexible: if you wanted to emphasise sentences and numbers as well, then composition would let you have any combination of emphasis strategies. But, for the design as is, inheritance offers a unit of “close classes”.

In single-inheritance languages, an inheritance hierarchy makes a convenient “module”; the superclass and its descendants are partitioned off. When the “isa” relationship really is present, it’s worthwhile making use of it to produce such a module, and that’s the case for the inheritance structure. The composition example contains twice as many such modules (Emphasiser and WordEmphasiser). In this case, a further refactoring would help: incorporate the GoF Composite pattern: rename Emphasiser to “DocumentEmphasiser” and create a new Emphasiser interface which both DocumentEmphasiser and WordEmphasiser implement. Now we’re back to one inheritance hierarchy, and one which expresses the DocumentEmphasiser-WordEmphasiser relationship more explicitly. Even so, this is still more complicated than a simple inheritance hierarchy.

And for all that, the composition architecture is the one I’d likely use in this case. Testability, as the blog entry says, is the key. Many developers think that designs shouldn’t be influenced by testability. Utter nonsense! Programs are not oil paintings … they are there to solve a problem, not to be admired for their elegance. Hopefully, elegant solutions will also solve problems, but in some cases a trade-off must be made. One of the use cases for software is testing it, so this use case must be considered in evaluating design choices.

In this case, composition is a bit less elegant, but a lot more testable, than composition. So it’s the optimal choice in most real-world situations. (Though I’d consider taking it further with Composite.)

So here’s the point: do we have to make the trade-off between testability and intuitiveness? All this shows there is an opportunity to create superior testing and language facilities. In an ideal environment, there will be no trade-off to make; the best design will be sufficiently testable.