Black Mirror Bandersnatch Spoilers

Here’s just a few random thoughts on Black Mirror Bandersnatch, which came out yesterday. I’d have posted these as a few lazy tweets, but didn’t want to post spoilers there.

Haven’t watched it yet? Congratulations, you have a life. But go watch it anyway. Binge the first four seasons beforehand if you haven’t seen them yet. Also worth it.

Thoughts with some very mild spoilers ahead:

  • “Choose Your Own Adventure” previously moved from book form to interactive game form, as depicted in Bandersnatch. Netflix’s experiment takes it another step to streaming form, which is interesting because there’s no way to look at the code and figure out every path. It also means they could change the path dynamically, e.g. introduce an easter egg for just one day.
  • But what about narrative, in a world where viewer decide the story? This is a popular trope in interactive storytelling, the tension between the producer guiding the story and the viewer feeling free to make decisions. The entire story of Bandersnatch addresses that tension better than anyone could express in a linear sentence. Free will in this medium is an illusion.
  • Netflix’s long term goal likely involves VR and personalization, i.e. personalized videos similar to the recent bubble of books where $child_name is the star. It’s not hard to see how gaming and movies eventually converge. A degree of interaction is an important step towards this future.and
  • Netflix’s short term goal likely see this as a popular move for some kids’ content. I’ve noticed Netflix also has Jeaopardy in its catalogue and it’s a no-brainer for the platform to be used for interactive TV game shows, as has long been done with cable remotes, but better.

A few technical observations:

  • The number input (for the safe) shows this is not just going to be a simple 2-way choice mechanism. I’m guessing they have set up a protocol to support input of any unicode string.
  • I noticed the episode doesn’t work on my Shield running recent (probably latest?) Android TV OS (Update: it works now, a day later, it must have required an app update.) Netflix has a lot of clients and it will be a big effort to implement – and evolve – this protocol ubiquitiously. They’ll also have to think about different input constraints, e.g. typing on TV might support voice input.
  • I also noticed the episode isn’t possible to download, and at one point saw it buffering after making a decision (on a bad network). The app probably pre-emptively downloads a minute or two of both decision paths so it can continue seamlessly after each decision. It would also probably hang on to the unchosen path in the event it’s needed later. Overall, there are some very interesting computer science and network topography problems for anyone wanting to optimise a system like this.

Netflix Bandersnatch

Horizontally scaling databases: MySQL/Postgres Sharding

At some point, a single database instance starts to creak as more objects are added to it, even with read-only replication. A battle-proven strategy here is to scale horizontally via sharding, however there be dragons. Here are general design principles on sharding with relational databases such as MySQL and Postgres.

These are some good case studies on MySQL sharding:

  • Sharding Pinterest: How we scaled our MySQL fleet (+ Hacker News thread on this).
  • Sharding & IDs at Instagram

  • Only shard when you have to. Premature optimisation is, after all, the root of all evil. Sharding adds more servers to build, maintain, failover, and backup; and it makes apps more complex.

  • Each object/record has its own GUID to uniquely identify it across all servers. The GUIDs indicate the shard this object lives in. When requests come in, they specify a GUID which the server can then map to a particular shard. Instagram Engineering has a good overview on GUID generation, there are various options with pros and cons.
  • Use many virtual shards and distribute evenly between them. It’s best to assign each object to one of thousands of virtual shards, and then map those to physical shards (ie database instances running on a particular host). e.g. You might assign an object to shard 1331 out of a possible 10,000 shards, using a simple random number or modulo function to ensure each shard has approximately the same quantity. This virtual shard its permanent home and will never change. You then map 1331 to “database server 3”. The reason for this indirection is so you can easily split up data as the system grows.
  • Related content lives in the same shard. Typically, content owned by a single user/team/company should live together on the same (virtual) shard. For many applications, the main queries that need a quick user response are all within the same object graph, ie some kind of join between a company, its workers, and their content. It makes sense to store all of this in the same shard, so if you have a natural hierarchy, ensure each class’s shard is initialised with that of the root class (e.g. each “sale” is assigned to the same shard as the shard of the “salesperson” who made them, and each “salesperson” is assigned to the same shard as the “company” they work for).
  • Slave replication is only for backup/failover. This is advocated by he Pinterest paper. Replication can cause weird “time travel” bugs, where an application reads stale data from a slave and then uses it to update the master. Sharding is sufficient to replace the performance benefits of reading from multiple slaves, so replication should only be used for backup and failover purposes. Each shard (and any central database) gets its own slave.

Explaining to the CDN why “vary” header matters

A CDN asked me why they should respect the HTTP “vary” header. My reply was this.

“vary” is needed because the app uses “PJAX” architecture used by many dynamic web apps and embraced in frameworks like Rails’ TurboGears. Simply put, all links are “hijacked” so that when the user clicks on them, the server responds with a slimmed-down version of the full page. It doesn’t have headers or footers.

It still comes from the same URL, so we have “full pages” and “partial pages” both served with the same URL. If the user enters a URL in their web browser and the CDN responds with a “partial page”, it will look like rubbish and fail to function, as there won’t be any stylesheets or scripts (as well as missing page headers and footers). And similarly, if the hijack ends up receiving a full page with headers and footers, it causes initial page scripts to be re-executed, leading to bugs and causes memory leaks.

With the CDN ignoring “vary”, I’ve had to hack the client PJAX library to force the URL to change when it’s hijacked (appending ?xhr=true). However, it still fails on redirects, as I can’t do anything about them on the client side. I added another hack to change it on the server side during redirect, but now people have reported this is still an issue with the homepage, which I can’t even track down, so I’ve now had to turn off caching on that page and still unsure what other pages are affected. Hopefully, that makes it clear why I’m screening for CDNs which respect the “vary” standard.

Taming Ansible with a control script

Ansible is very useful for managing multiple servers, but one of its weak points is lack of control over sequencing tasks.

The basic assumption is that you can execute all tasks because it embraces the principle of idempotency. If you add a new task, you should just be able to run the whole playbook again and there will be no side effects because all the other tasks are idempotent.

While that’s true in theory, there are many reasons why you’d want to run only a subset of a playbook:

  • Most importantly, performance. Some stuff can be slow. Even checking state, to prevent actually making any changes, can be slower than you’d like. Just a second delay for a playbook of 60 tasks would mean waiting a minute every time you want to do something.
  • While developing playbooks, it’s helpful for debugging purposes to execute only a subset of tasks. (Also, see previous point on performance. You don’t want to wait a minute to find out you had a typo.)
  • Some tasks should not be done by default. e.g. you may need to restart servers periodically to detect a memory leak. This would have to be “forced” because normally you’d just have a “service” rule indicating the server is “started”. So no action would take place because the server is already running, whereas what you need is a separate rule indicated the server is “restarted”.

Ansible’s primary answer to this is tags – you can specify a list of commands to be run by calling ansible-playbooks with a —-tags argument, and you can skip over other tasks using –skip-tags. These are useful, but limited, mainly because you can’t say “this task should only be executed if it’s tagged”. Yes, you can skip over it with skip-tags, but it will be run by default. e.g. if you have a “restart server” task, it will always be run as part of the standard playbook execution, unless you remember to skip it. Cumbersome.

The common workaround for this is to introduce another concept: variables. Instead of tagging such tasks, you make them happen only when some variable is true. And then default the variable to false. This works well, but things have suddenly got quite confusing to follow. The playbook is now an obstacle course where, for any given task, we have to figure out how to delicately step around some tasks while executing others.

For this reason, I’ve concluded the best approach is to make a front-end control script. It can’t be a “top-level” playbook, because another limitation is that playbooks can’t execute tagged or skip-tagged tasks. So it’s a plain-old bash script. I’m building up all typical invocations of Ansible in this control script. Every time I want to invoke Ansible, I ensure there’s a specific function present. Over time, I’ll be able to consolidate these and adjust the underlying playbooks as necessary.

It’s invoked like: ./ staging load_balancers

How to show dates to humans

First, how not to show them:

“Hey come to our amazing concert — 5/6!”

Now, how to show them:

“Hey come to our amazing concert — Wednesday May 18, 2016!”

I admit the former is more concise, bt cncs dsnt lwys mn bttr even if you can parse it.

The rules are simple, please do this when you mention a date:

  1. Include the year. There are 80 trillion web pages and most of them were written before a few months ago, so if I see a date without context, I have no evidence it refers to a time in the future. It could be any time in the last 2 decades.
  2. Name the month. Let’s not get involved in a big debate about MMDD versus DDMM versus YYYYMMDDAAAA🙏🙏🙏🙏ZZZZzzzz. When we’re displaying dates to regular users, keep it simple and use a format everyone immediately understands – the month name. Or an abbreviation thereof. I realise that’s not international-friendly, but the date presumably appears with surrounding text, so use the same language for the month and use one of many i18n frameworks to localise it if you have multiple languages. [1]
  3. Name the weekday. Come on, would it kill you to tell me what day this is on as well? That’s a big deciding factor for many people and helps to plan for the event and remember exactly when it happens.
  4. Count it down. Here’s where digital formats can better traditional printed formats. The date display can be dynamic, so you can show a countdown when it’s appropriate. Again, it helps to make the date more meaningful and can also create some excitement around the event.
  5. Add to calendar. In some cases, you might provide support for adding the date to users’ calendars. There’s unfortunately no great standards for this, but there are tools.

Any others?

  1. Credit Daniel for the reminder.

Developer Relations: A Five-Level Maturity Model

Having worked on both sides of developer relations, here are some thoughts about different levels of maturity for developer relations.

LEVEL 0: No developer relations

No internal effort is made to promote the platform, support developers, or capture their feedback.

LEVEL 1: Informal

No official developer relations staff or programme, but some developer relations handled by other functions. PR may be promoting the platform, business development may be partnering with and supporting developers.

LEVEL 2: High-touch

High-touch, often stealthy, relations with prized partners (i.e. large, established, companies or those with sufficient resources to build showcases for new features). This is a “don’t call us, we’ll call you” outreach which may entail the platform providing funding or direct technical capability to build out the integration, and often working with as-yet unannounced technology so it can be launched with a set of poster-child applications.

LEVEL 3: Evangelism

Promoting, explaining, and supporting the platform at scale via conferences, partnerships, and online media. Proactive efforts to recruit large amounts of developers to use the platform.

LEVEL 4: Advocacy

A 2-way relationship in which the platform’s own staff sees themselves as not just advocating for the platform, but as advocating for developers using the platform. With this mindset, developer relations plays an active role in feeding back real-world bugs and feature requests, and building supporting tools to improve the developer experience.

LEVEL 5: Quantified

Metrics-driven approach in which the return-on-investment for developer relations is understood and outreach efforts are able to quantified, both with high-touch partners and at scale.

Now some caveats about this.

First, how not to use this model. Any maturity model immediately makes you think companies should be ascending to the top level, but that is not the case and not the intention here. Ascending comes at a cost that may not be justified; clearly, a pure platform company (e.g. Twilio, Stripe) has a lot more incentive to get to the top than a product company with an experimental “labs” API, for example. There is financial cost, additional risks, and distraction to the rest of the organisation; all that needs to be weighed up. The purpose of this model, then, is to provide useful labels to facilitate these kinds of decisions. Not to imply one is intrinsically better than another.

So the way to actually use this model is simply to be true to yourself. Where are you now and where do you want to be? If you’re happy at level zero, scale any devrel back. If you want to shoot for level 5, start ramping up. Companies often differ widely between official and actual practices. A company may have no official developer relations programme, but instead have a technical marketing team or a super-engaged developer team who perform the same function. Likewise, no amount of fancy business cards will compensate for a developer relations programme that doesn’t develop and rarely relates. Hopefully, this model helps people to understand where they’re at.

Final caveat: Turns out you can’t pigeonhole a complex organisation into a simple number rating. The lines will blur when applying these definitions to $YOUR_FAVORITE_EXAMPLE. You may apply these definitions to a whole company, a single division, or a single product.

(Updated same day: moved maturity levels to top of article)

Bloom Filters and Recommendation Engines

I’ll explain here what Bloom filters are and how you might find them useful for recommendation engines. I haven’t used them yet in production — just sharing what I’ve been learning.

Why Bloom Filters?

I was thinking about recommendation system algorithm. Not the main algorithm: how do you generate good recommendations. But instead, an important “side algorithm”: how do you keep track of recommendations users have previously dismissed? All the genius recommendations in the world aren’t going to matter if you keep showing the same results.

The most obvious solution here would be to track everything. Simply store a new record for every “dismissal” the user makes. But that’s a lot to store in a high-scale system, e.g. if 10 million users dismissed 20 items each, you have 200 million records to store and index.

So this is where Bloom filters come in as a highly compressed way to store a set of values. The catch is: it’s fuzzy. It’s not really storing the set; instead, it’s letting you ask the question you want to ask, which is: “Is X in this set?” and coming back with a probabilistic answer. But that’s okay for something like a recommendation system.

Here’s an example. A user Jane has dismissed three articles, identified with their IDs: 123, 456, 789.

Under the traditional model, we perform a standard set inclusion check (e.g. check if a database row exists) and come out with a definite answer:


Q: Is article 888 in the "Jane" set? Algorithm: Check if 888 is in [123, 456, 789] A: No. I'm sure about that.

Under the fuzzy Bloom filter model, we end up with some funny value as a fuzzy representation of the whole set, and then we can get a probabilistic answer about set inclusion.

  01101001 (this is the Bloom filter)

Q: Is article 888 in the "Jane" set? Algorithm: Check against the Bloom filter (details below) A: Probably not. But maybe. About 5% likelihood it's in the set.

Deriving the Bloom filter

So in the previous example, how did we end up with that representation of the set (what I playfully refer to as 01101001). And what did we do with it?

It’s fairly simple. Remember, this is the only thing we store and the set builds up over time. So the Bloom filter starts out as empty and each new set member adds something to it.

The real representation is a bitwise vector, let’s go with 8 bits: 00000000

So when user Jane is created, her Bloom filter is 0000000.

Jane dismisses article 123. Now what we do is, we compute some hashes of 123 using different algorithms. Since we have decided to make our Bloom filter 8 bits, each hash algorithm should give a number between 0 and 256, so we can store the result. Let’s assume we use two hash algorithms to hash 123. One ends up with 64 (01000000) and the other with 33 (00100001). So now our Bloom filter is:


When we get a 1, we set the bit to 1. When we get a zero, we do nothing. So yes, over time, this will fill with 1s. That’s why we have to choose a big enough bloom filter size.

Going on, the next dismissal is 456. And maybe we end up with hash values 01001001 and 0110000. So the first of these has added a new “1” to our previous value of 011000001:


And finally, we might end up with 01001000 and 00100000 for ID 789, neither of which light up any new bits. So we still have the same Bloom filter as before.


Is X in the set?

Now we have Jane’s Bloom filter, 01101001. This is a fuzzy representation of [123, 456, 789]. We can then ask, for any given value, is it in the set?

e.g. if our recommendation algorithm comes up with 888, should we show it to Jane. We don’t want to show it if it is in that set of previous dismissals. We compute using the same hash algorithms as before and perhaps we end up with 00101100. It lit up a different bit (the 6th one), so we can say categorically, it’s not in the set. We know that for sure because if it was in the set, all those bits would be on. We know for sure it’s not in the set of dismissals, so we can confidently recommend it to Jane.

Take another recommendation we might end up with – 456. Do we show it to Jane? Well, is it in the set of previous dismissals? We compute and get 01101001. It fits within our Bloom filter, so there’s a good chance it was in the list of values that was used to build up the filter. But no guarantee. We might end up with a value of 00001000 for another ID, e.g. 555. This would also fit the Bloom filter and we can be no more certain that it was in the original set than we can be for the 456 value. So, it’s probabilistic. You can be certain some things aren’t in the set, but you can’t be certain something was in the set. For a recommendation of 456 or 555, we can’t be sure. So in either case, we will not show Jane the recommendation and look deeper for more certain values.

Fine tuning

The example above just magically decided to use a Bloom filter of 8 and hand-waved around the algorithms. In practice, you will need to decide on those things and in practice it will probably be hundreds or thousands of bits; otherwise, every bit will quickly fill up to become 1. A cool thing is that there are precise calculations that can help you estimate exactly how big the Bloom filter should be, based on the expected number of items in it, combined with your tolerance for error. (If your algorithm can easily generate lots of good recommendations, you could have quite a high tolerance because it would be easy to skip over any potential matches.)


Considering the recommendation problem made me recall this article about how Medium uses Bloom filters and also led me to a useful tutorial on the topic.

Sidekiq 4’s performance boost

Mike Perham managed to turbo-boost Sidekiq for v4, making it six times faster. This in itself is good news for those of us who use it and his write-up is also of interest. #perfmatters

The perf tricks that made this possible:

  1. Redis -> worker communication (dispatching new jobs to work on): Instead of a single, global, thread on the client taking requests from Redis and locally dispatching them, every worker now gets its own direct line to Redis.
  2. Worker -> Redis communication (reporting when a job is complete): Instead of workers constantly updating the server, there’s now a client-side proxy that updates it in batches every few seconds, ie it buffers up the pending updates and periodically sends them in a multiplexed message.
  3. Refactored to do direct thread manipulation instead of relying on Celluloid.

Very interesting that (1) and (2) are almost the inverse of each other. Redis → worker job assignment has been switched from a global model to a per-worker model, while Worker → Redis job completion reporting has been switched from a per-worker model to a global model. So that’s the time-honoured pendulum swing between centralisation and decentralisation, in a nutshell.

Also, as a commenter notes, it’s not obvious how much has been gained by the withdrawal of Celluloid. Removing a library can not only increase complexity, but can be counter-productive to performance if the library captures years of performance boosts you’ll otherwise have to learn yourself. Nevertheless, in the case of Celluloid, it was really there to simplify the multithreading programming effort, and given how important this is to Sidekiq, it’s the kind of thing that often makes sense to take full control of. (The dubious refactorings are those where some peripheral feature like logging just had to be home-made. In the case of mission-critical functionality, there’s often a lot to be said for DYI.)

When your app composes tweets: Dealing with metadata

For those who don’t know, Twitter converts every URL to its own “” shortener URL. So no matter how short or long your original URL is, the URL will end up as a fixed character length, and that character length does count towards the 140 limit.

Any sane Twitter client will hide this complexity from end-users. The word count algorithm will be smart enough to take this into account show the remaining characters.

But as a coder, you need to incorporate that logic yourself.

You should also know that Twitter’s API won’t automatically truncate a tweet, so if your app tries to send a long one, Twitter will return an error. So your tweet-posting app will need to truncate the tweet to 140 characters.

So I was coding up an auto-tweet setting, which requires you to estimate the length of a tweet. The code looks like:

  1. TWEET_LENGTH = 140
  2. TWITTER_URL_LENGTH = 19 // !!Danger - read on!!
  4. def compose_message(episode)
  5.    hashtag = '#nowplaying'
  6.    url_and_hashtag_suffix = " #{episode.url} #nowplaying"
  7.    max_title_length = TWEET_LENGTH - (1 + TWITTER_URL_LENGTH + 1 + hashtag.length)
  8.    "#{truncate episode.title, max_title_length}#{url_and_hashtag_suffix}"
  9. end

And then, with a long title, it failed. Can you guess why?

The answer is because I apparently went to sleep for three years, and when I woke up, the world had composed hundreds of billions of tweets. Many of them include URLs, which means the length has crept up to 22 characters – 23 for SSL URLs – rising at about 1 character a year. Yes, if your tweet has a link in it, you now have to be 2.5% more concise in describing the link (that’s 3/(140 – 19)).

Thankfully, there’s an API for this:

So your code could periodically crawl the config API and aggressively-cache the result. Or alternatively, have your build script download it to your code base at compile-time, if it hasn’t seen an update for a while.

I haven’t checked in detail, but there are probably some open-source Twitter packages (gems, NPM modules etc) that include this config data and keep it up to date.

Note this also affects images and video – the above config URL also provides the length of a media item.

A simple way to speed up Vim Ctrl-P plugin: Delegate to Ag

Ctrl-p is “Intellisense for Vim”, allowing you to quickly jump to a file by searching for a few letters or even fancy camel-case type searches. (e.g. find article_editor.rb by searching for “ae”).

However, doing all this requires it to maintain a search index, aka cache, to be maintained. That can be very frustrating with a big project as it takes 5-10 seconds to update, which is not a good thing when you’re desperately trying to jump around files. This delay would be fine if Ctrl-P worked in the background, but due to Vim limitations, it can’t, so you have to frequently run it on the command-line and wait for the update.

Or do you?

No you don’t. Here is a trick that lets you never wait for ctrl-p again! Just add this to your vimrc:

let g:ctrlp_user_command = 'ag %s -i --nocolor --nogroup --hidden
      \ --ignore .git
      \ --ignore .svn
      \ --ignore .hg
      \ --ignore .DS_Store
      \ --ignore "**/*.pyc"
      \ -g ""'

It’s taken straight from here. The cool thing about this trick is it doesn’t just speed up indexing, it completely removes the need for it. This is achieved by relying on the command-line tool Ag, aka Silver Searcher. It’s a brilliant grep replacement I would recommend to anyone, being exponentially faster than grep (as in, you can happily search a whole hard drive in real time).

I’ve used Ag for years but never realised it could be piped into Ctrl-P!

That page also includes some matching optimisation, but seriously the Ag trick was all I needed. Searching is now completely instantaneous and I never need to worry about the index going stale again.

The update has been pushed to my dotfiles.