Rails quirk: Adding to DateTime

Just came across a weird one with DateTime. Adding an int value will increment by a day, not a second as you may expect:

  1. $ x=1
  2. 1
  3. $ x.class
  4. Fixnum
  5. $ DateTime.new(2000, 1, 1, 0, 0, 0)+x
  6. Sun, 02 Jan 2000 00:00:00 +0000

But adding by a second is possible using explicitly 1.second. The strange thing is both inherit from FixNum and essentially act as the same number. So if you want it to mean seconds, one way to achieve it is use “x.seconds”.

  1. $ x=1.second
  2. 1 second
  3. $ x.class
  4. Fixnum
  5. $ DateTime.new(2000, 1, 1, 0, 0, 0)+x
  6. Sat, 01 Jan 2000 00:00:01 +0000

How HTTP client libraries should support redirects

I don’t think I’ve ever come across an HTTP library (and I’m including XMLHttpRequest here) that covers all bases on redirects. In the hope of pushing the field forward, here is my wishlist. I’m limiting this to just GETs as it gets even more complicated with the mutating actions. (Several browsers do XMLHttpRequest wrong in the case of POST redirects.)

  • Follow redirects by default (it’s the most typical need after all), but provide an option to turn that off.
  • Allow a limit on redirect-following recursion, to prevent too much redirecting [1].
  • Prevent infinite redirect cycles. Which arise due to misconfiguration or malicious denial-of-service efforts [1].
  • Report back the chain of redirection, with corresponding status codes (ie was it permanent?).

It’s really the last point that’s universally missing. For services like Player FM which need to retain a record of permanent redirects, it’s necessary. The lack of this features means having to write custom code to follows redirects. Which isn’t a giant burden, but it’s error-prone, a pain to test, and has to be done for every kind of resource.

  1. The recursion limit requirement has the side benefit of preventing infinite redirect cycles, so that alone may be adequate for a library. But it’s worth treating a separate requirement because: (a) it can be even more optimal if handled separately (ie it can terminate even before the limit is reached); (b) if the library does rely on some limit to prevent infinite recurison, the library should nevertheless enforce some maximum, e.g. 100, and also set some sane default, e.g. 5.

Photo Credit: Smabs Sputzer via Compfight cc

Key-based cache expiry: A developer’s primer

Key-based cache expiry is a powerful pattern for efficient and reliable caching. I’ve been using it on for some time now after reading DHH’s original post and it’s worked well. This post explains the more conventional approaches to explain how key-based caching arises as a fruitful alternative.

So then, how does caching normally work, and what’s so bad about that anyway?

Clear after some time. Sure, you can say “this stock price table expires in 5 minutes” and then re-render it every 5 minutes (or longer if no-one immediately requests it). The problem is, you’re often making a big compromise on both ends. On the one hand, you’ll end up with stale results when the stock price changes during this cache window, e.g. if it changes 2 minutes after you serve it, you’re sending wrong data for another 3 minutes. And on the other hand, what if it only changes once a day? Most of the time you’re needlessly re-retrieving data, re-calculating and re-rendering it. Wasteful.

Clear manually. Seeing the problems of time-based expiry, you could be tempted to just keep the cache up to date manually. So when you’re notified of a price change, you explicitly update the value in the cache. This will certainly be more efficient, but the logic gets pretty complex as you scale up. You end up with NxM code complexity as all N bits of code needs to be aware of which M cache items could be affected by changes.

So one technique is easy but inefficient and inaccurate; the other is efficient and accurate, but hard. Let’s see if we can find a sweet spot which is easy AND efficient AND accurate.

Don’t clear. With key-based cache expiry, everything’s put there forever and never cleared. How is that possible? Because it takes advantage of the cache’s built-in automatic expiry mechanism. We must use a cache, such as Memcached or Redis, which supports some kind of expiry based on least-recently-used (LRU) or similar selection. In that sense, we have reached our application’s sweet spot by offloading complexity to the cache framework.

How this works is the keys must reflect a version or timestamp of the object being cached, e.g. a key might be “article-123-201404070123401″, generalised as “type-id-timestamp”. Normally clients won’t request the object by version, so you’ll need to do a quick lookup to find the object’s latest version or timestamp [1]. Then you retrieve it from the cache, or write through to the cache if it’s not already present. And the important thing is you write to it with infinite expiry.

The technique can be used at many levels – HTTP caching, memcached, persistent databases. I first asked about it here and I’ve since used it effectively in production on Player FM’s website and API. It’s certainly how various frameworks handle asset serving (ie compiling CSS with a timestamp and so on), and it’s also an official part of Rails 4, and I expect other frameworks in the future. So it’s a pattern programmers should be familiar with in an era where performance is held in high esteem, and rightly so.

  1. Looking up timestamp or version is work you don’t have to do with manual expiry, so it’s again a trade-off that makes this slightly less efficient, but a lot easier. Furthermore, if you arrange things right, you can actually have clients request the latest version/timestamp for all but the original resource (when they are requesting several resources in succession).

Photo Credit: Paolo Margari via Compfight cc

Load-balancing Rails with Nginx

Well this was some fine undocumented black magic. I’ve got Player FM behind a load balancer now, using the following Nginx config. I’ll explain some more about the overall upgrade later.

# App load balancer

upstream playerhost {
 server 192.168.1.1;
 server 192.168.1.2;
}

server {

  server_name playerhost;

  location / {

    proxy_set_header Authorization "Basic blahblahblah==";
    proxy_next_upstream http_500 http_502 http_503 http_504 timeout error;

    # http://stackoverflow.com/questions/16159998/omniauth-nginx-unicorn-callback-to-wrong-host-url
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header Client-IP $remote_addr;
    proxy_set_header X-Forwarded-For $remote_addr;

    proxy_redirect http://playerhost http://player.fm;
    proxy_redirect https://playerhost https://player.fm;
    proxy_pass http://playerhost;

  }
}

Notes

  • I recommend using a distinct name for the backend (I’ve used “playerhost”). Most tutorials use the non-descript “backend”, but it’s a useful indicator if something’s going wrong as you’ll see this ID pop up in URLs and HTML content.
  • You don’t have to use basic auth for the backends. You could just firewall them off the public internet or deal with them being public. Being public is not clean and will cause the site to be hit by search bots and so on unnecessarily. But closing it off altogether is not ideal, because it’s useful for diagnostics to go into the backend servers. So I expose them via basic auth. The “blahblahblah” is base64 of your basic auth username:password.
  • The site mostly worked without set_header, but I had some weird oAuth redirect problems and occasional HTML problems. In both cases, I’d be seeing that “playerhost” URL instead of the actual domain. This fixed it.
  • The proxy_redirect commands were an earlier attempt to fix https redirections, and worked, but still left the problem mentioned in the previous point. It may not be necessary at all, after adding the set_headers. I haven’t tested that yet.

Photo Credit: Graham Cook via Compfight cc