How to use different favicons for development, staging, and production

It’s useful to have different favicons for each environment during development, as Pamela recently pointed out. Here’s how I do it.

First, generate the images. Most graphics editor have some kind of Colorize tool and if you do it often, use ImageMagick to change colors programmatically. If you’re lucky enough to have a green brand logo, a cool combo would be traffic lights. Justsaying.

Now, set the favicons according to your environment. Should you just use /favicon.ico location or point to the location in head? The answer is both. /favicon.ico is useful for third-party plugins whose HTML you don’t control, as well as quick hack pages you can’t be bothered configuring with proper metadata. A link tag in html head is useful for revving the icon when it changes, to bust any cached version. So use both.

To set favicon in HEAD, ensure the path is dynamically generated instead of hard-coded:

  1. %link(rel="icon" href="#{favicon_path}" type="image/x-icon")

The path is generated like such as:

  1. def env_suffix
  2.   Rails.env.production? ? '' : "-#{Rails.env}"
  3. end
  4.  
  5. def favicon_path
  6.   asset_path "favicon#{env_suffix}.ico"
  7. end

It’s arguably bad form to use “production” strings in production, so the env_suffix nonsense above will hide it. If you don’t care about that, just call your production icon favicon-production.ico and save yourself a little hassle.

As mentioned earlier, we also want just /favicon.ico to exist. A quick way to do that is copy it when the apps starts or define /favicon.ico as a route and serve the file as favicon_path, defined above, with sufficient cache expiry time (e.g. 1 day). e.g.:

  1. def favicon
  2. path = ActionController::Base.helpers.asset_path "favicon#{env_suffix}.ico"
  3. send_file path, type:"image/x-icon", disposition:"inline"
  4. end

For bonus points, you might want to use similar techniques to provide overriding stylesheets for development and staging. Then you can introduce a text label or change background color, etc.

Load-balancing Rails with Nginx

Well this was some fine undocumented black magic. I’ve got Player FM behind a load balancer now, using the following Nginx config. I’ll explain some more about the overall upgrade later.

# App load balancer

upstream playerhost {
 server 192.168.1.1;
 server 192.168.1.2;
}

server {

  server_name playerhost;

  location / {

    proxy_set_header Authorization "Basic blahblahblah==";
    proxy_next_upstream http_500 http_502 http_503 http_504 timeout error;

    # http://stackoverflow.com/questions/16159998/omniauth-nginx-unicorn-callback-to-wrong-host-url
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header Client-IP $remote_addr;
    proxy_set_header X-Forwarded-For $remote_addr;

    proxy_redirect http://playerhost http://player.fm;
    proxy_redirect https://playerhost https://player.fm;
    proxy_pass http://playerhost;

  }
}

Notes

  • I recommend using a distinct name for the backend (I’ve used “playerhost”). Most tutorials use the non-descript “backend”, but it’s a useful indicator if something’s going wrong as you’ll see this ID pop up in URLs and HTML content.
  • You don’t have to use basic auth for the backends. You could just firewall them off the public internet or deal with them being public. Being public is not clean and will cause the site to be hit by search bots and so on unnecessarily. But closing it off altogether is not ideal, because it’s useful for diagnostics to go into the backend servers. So I expose them via basic auth. The “blahblahblah” is base64 of your basic auth username:password.
  • The site mostly worked without set_header, but I had some weird oAuth redirect problems and occasional HTML problems. In both cases, I’d be seeing that “playerhost” URL instead of the actual domain. This fixed it.
  • The proxy_redirect commands were an earlier attempt to fix https redirections, and worked, but still left the problem mentioned in the previous point. It may not be necessary at all, after adding the set_headers. I haven’t tested that yet.

Photo Credit: Graham Cook via Compfight cc

Defer and Recur with Rails, Redis, and Resque

I’ve put off some scaling related issues about as long as possible, and am now proceeding to introduce a deferred-job stack. I’ll explain what I’ve learned so far, and with the caveat: this isn’t in production yet. I’m still learning.

What it’s all about

Tools like Resque let you perform work asynchronously. That way, you can turn requests around quickly, so the user gets something back immediately, even if it’s just “Thanks, we got your request”, which is nicer than the user waiting around 5 minutes, and ensures your server doesn’t lock up in the process. Typical example being sending an email – you don’t want the user’s browser to wait while your server connects elsewhere and fires off the email. Other examples would be fetching a user’s profile or avatar after they provide their social profile info; or generating a report they asked for.

So you set up an async job and respond telling the user their message is on the way. If you need to show the user the result of the delayed job, make the clien polls the server and render the result when it’s ready. More power XHR!

The simple way to do this

The simple way, which worked just fine for me for a long time and I’d recommend for anyone starting, is a simple daemon process. Basically:

  1. while true
  2.     if (check_database_for_condition)
  3.       do_something
  4.     sleep 10
  5.   end

The fancy way

The problem with the simple way is it can be hard to parallelise and monitor; you’ll end up reinventing the wheel. So to stand on the shoulders of giants, go install Redis, Resque, and Resque-Scheduler. I’ll explain each.

Redis

Redis, as you probably know, is a NOSQL database. It’s been described as a “data structure server” as it stores lists, trees, and hashes; and assuming Knuth is your homeboy, that’s a mighty fine concept. And it’s super-fast because everything is kept in memory, with (depending on config) frequent persistence to disk for durability.

Resque

Resque is no sneezing matter either, being a tool made and used by GitHub, no less.

Resque uses Redis to store the actual jobs. It’s worth explaining the main components of Resque, because I’ve found they’re often not defined very clearly and if you don’t understand this, everything else will trip you up.

Job. A job is a task you want to perform. For example Job 1234 might be “Send welcome email to [email protected]”. In Resque, a job is defined as a simple class having a “perform” method, which is what does the work [1].

Queue. Jobs live in queues. There’s a one-liner config item in the Job class to say which queue it belongs to. In a simple app, you could just push all jobs to a single queue, whereas in a bigger app, you might want separate queues for each job type. e.g. you’d end up with separate queues for “Send welcome email”, “Fetch user’s avatar”, and “Generate Report”. The main advantage of separate queues is you can give certain queues priority. In addition to these queues, you also have a special “failed” queue. Tasks that throw exceptions are moved to “failed”; otherwise the task disappears.

Worker. A worker is a process that runs the jobs. So a worker polls the queues, picks the oldest jobs off them, and runs them. You start workers via Resque’s Rake task, and in doing so, you tell it which queues to run. There’s a wildcard option to run all queues, but for fine-grained optimisations, you could set up more workers to run higher-priority queues and so on.

An important note about the environment. Rails’ environment can take a long time to start, e.g. 30 seconds. You clearly don’t want a 30-second delay just to send an email. So workers will fork themselves before starting the job. This way, each job gets a fresh environment to run off, but you don’t have the overhead of starting up each time. (This is the same principle as Unicorn’s management of Rails’ servers.) So starting the worker does incur the initial Rails startup overhead, but starting each job doesn’t. In practice, jobs can begin in a fraction of a second. You can further optimise this by making a custom environment for the workers, e.g. don’t use all of Rails, but just use ActiveRecord, and so on. But it’s probably not worth the effort initially as the fork() mechanism gets you 80% there.

Resque-Scheduler

For many people, Resque alone will fit the bill. But certain situations also call for an explicit delay, e.g. “send this email reminder in 5 days”; or repeat a task, e.g. “generate a fresh report at 8am each day”. That’s where Resque-Scheduler comes in [2].

Resque-Scheduler was originally part of Resque, so it basically extends the Resque API. The “scheduling”, i.e. repeated tasks, are represented as a Cronjob-like hash structure and can be conveniently represented in a YML file.

Delayed jobs are created by your application code. It’s basically the same call as when you add the job directly to Resque, but you need to specify an additional delay or time argument.

The cool thing is jobs are persisted into Redis, so they will survive if the system — or any components (Redis/Resque/Resque-Scheduler) — goes down. I was confused at first as I thought they were added to some special Resque queue. But no, they are actually in the Redis database. I found this by entering keys * into Redis’s command-line tool (redis-cli), which yielded some structures including “resque:delayed:1372936216″. When I then entered dump resque:delayed:1372936216, I got back a data structure which was basically my job spec, ie. {class: 'FeedHandler', arg: ['http://example.com'].

So Resque-Scheduler basically wakes up every second or so, and does two things: (a) polls Redis to see if any delayed jobs should now be executed; (b) inspects its “schedule” data structure to see if any repeated jobs should now be executed. If any jobs should now be executed, it pushes them to the appropriate Resque queue.

Notes

  1. Conceptually a job definition is little more than a function definition, rather than a full-blown class. But being a class is the more Rubyesque way to do it and also makes it easy to perform complex tasks as you can use attributes to hold intermediate results, since each job execution will instantiate a new job object.

  2. I evaluated other tools, e.g. Rufus and Clockwork, but what appeals about Resque-Scheduler is it persists delayed jobs and handles both one-off and repeated jobs.

Testing HTTPS Locally

As I’m migrating the player over to HTTPS, one challenge is partial content, leading to an incomplete padlock and strikethrough domain warning like this:

And the harsh but fair warning, upon inspection: “However, this page includes other resources which are not secure. These resources can be viewed by others while in transit, and can be modified by an attacker to change the look of the page.”

So to fix this locally, a nice setup for Ruby/Rails devs is Pow + Tunnels. Both are super-simple to setup.

Pow is a local server, so if you usually run Rails on http://localhost:3000, you can one-click install Pow and all you need is to symlink your Rails folder to ~/.pow. Then you have a local server, sans port, like http://player.dev. Then, just install Tunnels and it will simply pipe https://player.dev into http://player.dev.

Now you can open Chrome devtools’ resource tab and fish out any connections which are still https. Ideally host them locally, or at least change the links to https ones at possible loss of cache performance. Still, did you see various posts recently about ISPs injecting crapware script tags into people’s pages? OMG I know right! Seriously, https-everywhere is where the web is heading. Even public sites aren’t immune.

Private resources with ElasticSearch and Tire

I’m adding private channels to Player FM and one consideration is search results. Tire’s activerecord does a great job at making updates transparent, but in this case some manual overriding is required.

Importantly, this allows the user to switch privacy on and off, and the index will automatically be created and deleted. I initially considered using a “_changed?” check, but realised it’s unnecessary as ElasticSearch’s remove operation is idempotent. In other words, it’s safe to remove an already-removed item. Yes, the call could be avoided by checking if the resource is already private, but the call is cheap, a fraction of the cost incurred if the channel was public anyway (i.e. it would have to be re-indexed).

There was some talk of a “should_be_indexed?” method which any record could override. I think it would be perfect for this use case – it’d just be a case of a one-word return value (public?) but alas, it wasn’t added. As the code above shows, though, it’s pretty simple to DIY.

image by Zebble

Rails Cache Sweeper Gotchas

As you’ll see here, Rails cache sweepers are a tricky subject. Here are some general things I’ve learned.

  • Sweepers are dual creatures. “Here’s the scoop: sweepers observe both your models and your controllers. They’re not half-this and half-that, they’re both_. You can define model callbacks (after_save, afterupdate, etc.), and you can also define before/after callbacks for any controller action (e.g. after_list_create).”

  • Notwithstanding the above, most references and despairing workarounds focus on their controller nature. Dandy for a web forms app, but in my case, cached content is being invalidated by daemon processes operating directly on models.

  • There’s no standard home for Sweepers. So much for convention-over-configuration :). So I opted for a new app/sweepers directory and added it in application.rb: config.autoload_paths += %W(#{config.root}/app/sweepers).

  • Now to the crux of the matter: The Sweeper still does nothing, even if it’s in the path. I don’t know why, but it’s a common problem! I had to explicitly add it as an observer: config.active_record.observers = :episode_sweeper. This is the model equivalent of people explicitly adding it to their controller with an after_update hook.

  • Now to the crux of the matter, redux: Okay, so now the sweeper is being called when models change (specifically, the models it declares it’s observing). Great. But it still doesn’t work — expire_fragment apparently doesn’t, because I’m still seeing the the old fragment appear in the web app. WAT? The answer turned out to be, don’t just call expire_fragment(). Instead, call ActionController::Base.new.expire_fragment(). It seems the fragment used when outputting a view is not the same as that expired by expire_fragment(). I’m only telling you what worked here, I can’t tell you why!

  • You can also expire cache in the Rails console for testing purposes, just call ActionController::Base.new.expire_fragment(). (I think you need to restart Rails (and the console) if you update the sweeper code, given that it’s set up as an arel observer in the config line above. But haven’t fully tested that.)

This is just a basic implementation for now. A better implementation is probably to use DHH’s key-based caching approach, which has the neat principle of generating a new key every time the fragment changes.

Firefighting an RSS Aggregator’s Performance

chart

That’s a before-and-after shot of the database server’s CPU! I was watching it slowly creep up, planning to inspect it after some other work, before receiving mails from Linode that the virtual server is running over 102% capacity, then 110, 120, …

Three things made the difference in fixing this:

Feed Item Keys Must Be Unique

The most important thing was to nail down keys, which I noticed from looking at logs and the oddly cyclic nature of the graph above. I later on ran a query to see how many items were being stored for each feed, and sure enough, certain feeds had thousands of items and counting.

The RSS 2.0 spec (as official a spec as there is) says of individual items: “All elements of an item are optional, however at least one of title or description must be present.”. What’s missing there is a primary key! Fortunately, most feeds do have a unique <link>, <guid>, or both. But if you’re trying to be robust and handle unanticipated feeds, it gets tricky. There were also some boundary cases involving feeds which had changed their strategy (fortunately, improved it by adding guids) at some point, but never updated the old items. So the feed was a hybrid.

The net effect was a gigantic number of items being accumulated for certain feeds. Every hour, the server checked for updates, it decided that yes these key-less feeds had totally changed and we need to pull all the posts in again and save a record of it. That’s why you see the hourly cycles in the “before” picture. I still need to go and cleanse the database of those duplicate items.

By taking a step back and looking at what makes the items truly unique, and with the help of Rails’ handy collection methods, it was possible to make feed items unique again and smooth out crawling.

Indexing

Inspecting a handful of anomalous feeds once an hour, due to the problem mentioned above, is not the worst thing in the world. What made the server veer towards FUBAR was certain query that was being performed each time in the absence of indexes. I was able to see the heaviest queries in the Rails log using the grep/sed command posted here yesterday. I added those indexes and the queries went from ~ 1200ms to 20ms, with the overall throughput for a feed dropping down to about 20% of its former time.

Validation

A third issue was forcing the database to run the wheel all the time. This wasn’t a major hourly thrashing like above, but a few feeds that were being polled every few minutes.

I got a sniff of this problem when I noticed the same set of feeds would keep appearing when I looked at the logs. After grepping, I realised they were not obeying the rule of waiting an hour to re-check, but were in fact taking their turn to poll the external feed, then jumping right back in line for another go.

 

This really wasn’t having much performance impact, because these feeds weren’t adding new items with each check (as the item keys were sound). But with more feeds like this, it could have an impact, and more to the point, being polled every few minutes is not good for my bandwidth or the people on the receiving end!

The cause turned out to be some trivial problems with feed items, which were being blocked by Rails’ validation when trying to save the items. Because scheduling info is &emdash; for convenience &emdash; tied to the items’ records, the scheduling info was being lost. A bit of overkill to isolate out the scheduling info at this stage, so I switched the validation to a before_save which did some cleansing to ensure the format is right.

Update: IO Rate

IO Chart

Another issue I still had to fix was the IO rate. You can see it above, not in the spikes – which reflect me making the fixes above – but in the small wave pattern on the left. Those are actually very high in absolute terms, at around 1K blocks per second being transferred between disk and memory. This is due to swap thrashing and required updates to my.cnf. In particular, decreasing key_buffer. Also, I decreased max_connections, such that (with key_buffer change), https://github.com/rackerhacker/MySQLTuner-perl was content with the memory required, and also increasing innodb_buffer_pool_size. I haven’t measured the effect of that yet, need to let it run for a while in order to get that.

I’m sure plenty of other optimisations are possible, but the good news is that IO Rate has gone right down to near-zero and swap rate likewise. So no more thrashing.

Sorting a Rails log file by SQL duration

Rails’ ActiveRecord logger writes log files like:

Post Load (735.8ms) SELECT posts.* FROM posts where post.title = 'foo'

You may want to know the longest SQL queries for performance optimsation purposes, and general troubleshooting. To list recent queries in order of duration, with longest queries shown last, use this:

head -10000 development.log | grep '([0-9.]+ms)' | sed 's/.(([[:digit:].]+)ms./1ms &/g' | sort -n

(The sed expression was a little more work than I’d bargained for as sed regular expressions are always lazy; even with GNU/Posix extensions, non-lazy just doesn’t exist.)