Taming Ansible with a control script

Ansible is very useful for managing multiple servers, but one of its weak points is lack of control over sequencing tasks.

The basic assumption is that you can execute all tasks because it embraces the principle of idempotency. If you add a new task, you should just be able to run the whole playbook again and there will be no side effects because all the other tasks are idempotent.

While that’s true in theory, there are many reasons why you’d want to run only a subset of a playbook:

  • Most importantly, performance. Some stuff can be slow. Even checking state, to prevent actually making any changes, can be slower than you’d like. Just a second delay for a playbook of 60 tasks would mean waiting a minute every time you want to do something.
  • While developing playbooks, it’s helpful for debugging purposes to execute only a subset of tasks. (Also, see previous point on performance. You don’t want to wait a minute to find out you had a typo.)
  • Some tasks should not be done by default. e.g. you may need to restart servers periodically to detect a memory leak. This would have to be “forced” because normally you’d just have a “service” rule indicating the server is “started”. So no action would take place because the server is already running, whereas what you need is a separate rule indicated the server is “restarted”.

Ansible’s primary answer to this is tags – you can specify a list of commands to be run by calling ansible-playbooks with a —-tags argument, and you can skip over other tasks using –skip-tags. These are useful, but limited, mainly because you can’t say “this task should only be executed if it’s tagged”. Yes, you can skip over it with skip-tags, but it will be run by default. e.g. if you have a “restart server” task, it will always be run as part of the standard playbook execution, unless you remember to skip it. Cumbersome.

The common workaround for this is to introduce another concept: variables. Instead of tagging such tasks, you make them happen only when some variable is true. And then default the variable to false. This works well, but things have suddenly got quite confusing to follow. The playbook is now an obstacle course where, for any given task, we have to figure out how to delicately step around some tasks while executing others.

For this reason, I’ve concluded the best approach is to make a front-end control script. It can’t be a “top-level” playbook, because another limitation is that playbooks can’t execute tagged or skip-tagged tasks. So it’s a plain-old bash script. I’m building up all typical invocations of Ansible in this control script. Every time I want to invoke Ansible, I ensure there’s a specific function present. Over time, I’ll be able to consolidate these and adjust the underlying playbooks as necessary.

It’s invoked like: ./control.sh staging load_balancers

How to show dates to humans

First, how not to show them:

“Hey come to our amazing concert — 5/6!”

Now, how to show them:

“Hey come to our amazing concert — Wednesday May 18, 2016!”

I admit the former is more concise, bt cncs dsnt lwys mn bttr even if you can parse it.

The rules are simple, please do this when you mention a date:

  1. Include the year. There are 80 trillion web pages and most of them were written before a few months ago, so if I see a date without context, I have no evidence it refers to a time in the future. It could be any time in the last 2 decades.
  2. Name the month. Let’s not get involved in a big debate about MMDD versus DDMM versus YYYYMMDDAAAA🙏🙏🙏🙏ZZZZzzzz. When we’re displaying dates to regular users, keep it simple and use a format everyone immediately understands – the month name. Or an abbreviation thereof. I realise that’s not international-friendly, but the date presumably appears with surrounding text, so use the same language for the month and use one of many i18n frameworks to localise it if you have multiple languages. [1]
  3. Name the weekday. Come on, would it kill you to tell me what day this is on as well? That’s a big deciding factor for many people and helps to plan for the event and remember exactly when it happens.
  4. Count it down. Here’s where digital formats can better traditional printed formats. The date display can be dynamic, so you can show a countdown when it’s appropriate. Again, it helps to make the date more meaningful and can also create some excitement around the event.
  5. Add to calendar. In some cases, you might provide support for adding the date to users’ calendars. There’s unfortunately no great standards for this, but there are tools.

Any others?

  1. Credit Daniel for the reminder.

Developer Relations: A Five-Level Maturity Model

Having worked on both sides of developer relations, here are some thoughts about different levels of maturity for developer relations.

LEVEL 0: No developer relations

No internal effort is made to promote the platform, support developers, or capture their feedback.

LEVEL 1: Informal

No official developer relations staff or programme, but some developer relations handled by other functions. PR may be promoting the platform, business development may be partnering with and supporting developers.

LEVEL 2: High-touch

High-touch, often stealthy, relations with prized partners (i.e. large, established, companies or those with sufficient resources to build showcases for new features). This is a “don’t call us, we’ll call you” outreach which may entail the platform providing funding or direct technical capability to build out the integration, and often working with as-yet unannounced technology so it can be launched with a set of poster-child applications.

LEVEL 3: Evangelism

Promoting, explaining, and supporting the platform at scale via conferences, partnerships, and online media. Proactive efforts to recruit large amounts of developers to use the platform.

LEVEL 4: Advocacy

A 2-way relationship in which the platform’s own staff sees themselves as not just advocating for the platform, but as advocating for developers using the platform. With this mindset, developer relations plays an active role in feeding back real-world bugs and feature requests, and building supporting tools to improve the developer experience.

LEVEL 5: Quantified

Metrics-driven approach in which the return-on-investment for developer relations is understood and outreach efforts are able to quantified, both with high-touch partners and at scale.

Now some caveats about this.

First, how not to use this model. Any maturity model immediately makes you think companies should be ascending to the top level, but that is not the case and not the intention here. Ascending comes at a cost that may not be justified; clearly, a pure platform company (e.g. Twilio, Stripe) has a lot more incentive to get to the top than a product company with an experimental “labs” API, for example. There is financial cost, additional risks, and distraction to the rest of the organisation; all that needs to be weighed up. The purpose of this model, then, is to provide useful labels to facilitate these kinds of decisions. Not to imply one is intrinsically better than another.

So the way to actually use this model is simply to be true to yourself. Where are you now and where do you want to be? If you’re happy at level zero, scale any devrel back. If you want to shoot for level 5, start ramping up. Companies often differ widely between official and actual practices. A company may have no official developer relations programme, but instead have a technical marketing team or a super-engaged developer team who perform the same function. Likewise, no amount of fancy business cards will compensate for a developer relations programme that doesn’t develop and rarely relates. Hopefully, this model helps people to understand where they’re at.

Final caveat: Turns out you can’t pigeonhole a complex organisation into a simple number rating. The lines will blur when applying these definitions to $YOUR_FAVORITE_EXAMPLE. You may apply these definitions to a whole company, a single division, or a single product.

(Updated same day: moved maturity levels to top of article)

Bloom Filters and Recommendation Engines

I’ll explain here what Bloom filters are and how you might find them useful for recommendation engines. I haven’t used them yet in production — just sharing what I’ve been learning.

Why Bloom Filters?

I was thinking about recommendation system algorithm. Not the main algorithm: how do you generate good recommendations. But instead, an important “side algorithm”: how do you keep track of recommendations users have previously dismissed? All the genius recommendations in the world aren’t going to matter if you keep showing the same results.

The most obvious solution here would be to track everything. Simply store a new record for every “dismissal” the user makes. But that’s a lot to store in a high-scale system, e.g. if 10 million users dismissed 20 items each, you have 200 million records to store and index.

So this is where Bloom filters come in as a highly compressed way to store a set of values. The catch is: it’s fuzzy. It’s not really storing the set; instead, it’s letting you ask the question you want to ask, which is: “Is X in this set?” and coming back with a probabilistic answer. But that’s okay for something like a recommendation system.

Here’s an example. A user Jane has dismissed three articles, identified with their IDs: 123, 456, 789.

Under the traditional model, we perform a standard set inclusion check (e.g. check if a database row exists) and come out with a definite answer:


Q: Is article 888 in the "Jane" set? Algorithm: Check if 888 is in [123, 456, 789] A: No. I'm sure about that.

Under the fuzzy Bloom filter model, we end up with some funny value as a fuzzy representation of the whole set, and then we can get a probabilistic answer about set inclusion.

  01101001 (this is the Bloom filter)

Q: Is article 888 in the "Jane" set? Algorithm: Check against the Bloom filter (details below) A: Probably not. But maybe. About 5% likelihood it's in the set.

Deriving the Bloom filter

So in the previous example, how did we end up with that representation of the set (what I playfully refer to as 01101001). And what did we do with it?

It’s fairly simple. Remember, this is the only thing we store and the set builds up over time. So the Bloom filter starts out as empty and each new set member adds something to it.

The real representation is a bitwise vector, let’s go with 8 bits: 00000000

So when user Jane is created, her Bloom filter is 0000000.

Jane dismisses article 123. Now what we do is, we compute some hashes of 123 using different algorithms. Since we have decided to make our Bloom filter 8 bits, each hash algorithm should give a number between 0 and 256, so we can store the result. Let’s assume we use two hash algorithms to hash 123. One ends up with 64 (01000000) and the other with 33 (00100001). So now our Bloom filter is:


When we get a 1, we set the bit to 1. When we get a zero, we do nothing. So yes, over time, this will fill with 1s. That’s why we have to choose a big enough bloom filter size.

Going on, the next dismissal is 456. And maybe we end up with hash values 01001001 and 0110000. So the first of these has added a new “1” to our previous value of 011000001:


And finally, we might end up with 01001000 and 00100000 for ID 789, neither of which light up any new bits. So we still have the same Bloom filter as before.


Is X in the set?

Now we have Jane’s Bloom filter, 01101001. This is a fuzzy representation of [123, 456, 789]. We can then ask, for any given value, is it in the set?

e.g. if our recommendation algorithm comes up with 888, should we show it to Jane. We don’t want to show it if it is in that set of previous dismissals. We compute using the same hash algorithms as before and perhaps we end up with 00101100. It lit up a different bit (the 6th one), so we can say categorically, it’s not in the set. We know that for sure because if it was in the set, all those bits would be on. We know for sure it’s not in the set of dismissals, so we can confidently recommend it to Jane.

Take another recommendation we might end up with – 456. Do we show it to Jane? Well, is it in the set of previous dismissals? We compute and get 01101001. It fits within our Bloom filter, so there’s a good chance it was in the list of values that was used to build up the filter. But no guarantee. We might end up with a value of 00001000 for another ID, e.g. 555. This would also fit the Bloom filter and we can be no more certain that it was in the original set than we can be for the 456 value. So, it’s probabilistic. You can be certain some things aren’t in the set, but you can’t be certain something was in the set. For a recommendation of 456 or 555, we can’t be sure. So in either case, we will not show Jane the recommendation and look deeper for more certain values.

Fine tuning

The example above just magically decided to use a Bloom filter of 8 and hand-waved around the algorithms. In practice, you will need to decide on those things and in practice it will probably be hundreds or thousands of bits; otherwise, every bit will quickly fill up to become 1. A cool thing is that there are precise calculations that can help you estimate exactly how big the Bloom filter should be, based on the expected number of items in it, combined with your tolerance for error. (If your algorithm can easily generate lots of good recommendations, you could have quite a high tolerance because it would be easy to skip over any potential matches.)


Considering the recommendation problem made me recall this article about how Medium uses Bloom filters and also led me to a useful tutorial on the topic.

Sidekiq 4’s performance boost

Mike Perham managed to turbo-boost Sidekiq for v4, making it six times faster. This in itself is good news for those of us who use it and his write-up is also of interest. #perfmatters

The perf tricks that made this possible:

  1. Redis -> worker communication (dispatching new jobs to work on): Instead of a single, global, thread on the client taking requests from Redis and locally dispatching them, every worker now gets its own direct line to Redis.
  2. Worker -> Redis communication (reporting when a job is complete): Instead of workers constantly updating the server, there’s now a client-side proxy that updates it in batches every few seconds, ie it buffers up the pending updates and periodically sends them in a multiplexed message.
  3. Refactored to do direct thread manipulation instead of relying on Celluloid.

Very interesting that (1) and (2) are almost the inverse of each other. Redis → worker job assignment has been switched from a global model to a per-worker model, while Worker → Redis job completion reporting has been switched from a per-worker model to a global model. So that’s the time-honoured pendulum swing between centralisation and decentralisation, in a nutshell.

Also, as a commenter notes, it’s not obvious how much has been gained by the withdrawal of Celluloid. Removing a library can not only increase complexity, but can be counter-productive to performance if the library captures years of performance boosts you’ll otherwise have to learn yourself. Nevertheless, in the case of Celluloid, it was really there to simplify the multithreading programming effort, and given how important this is to Sidekiq, it’s the kind of thing that often makes sense to take full control of. (The dubious refactorings are those where some peripheral feature like logging just had to be home-made. In the case of mission-critical functionality, there’s often a lot to be said for DYI.)

When your app composes tweets: Dealing with metadata

For those who don’t know, Twitter converts every URL to its own “t.co” shortener URL. So no matter how short or long your original URL is, the t.co URL will end up as a fixed character length, and that character length does count towards the 140 limit.

Any sane Twitter client will hide this complexity from end-users. The word count algorithm will be smart enough to take this into account show the remaining characters.

But as a coder, you need to incorporate that logic yourself.

You should also know that Twitter’s API won’t automatically truncate a tweet, so if your app tries to send a long one, Twitter will return an error. So your tweet-posting app will need to truncate the tweet to 140 characters.

So I was coding up an auto-tweet setting, which requires you to estimate the length of a tweet. The code looks like:

  1. TWEET_LENGTH = 140
  2. TWITTER_URL_LENGTH = 19 // !!Danger - read on!!
  4. def compose_message(episode)
  5.    hashtag = '#nowplaying'
  6.    url_and_hashtag_suffix = " #{episode.url} #nowplaying"
  7.    max_title_length = TWEET_LENGTH - (1 + TWITTER_URL_LENGTH + 1 + hashtag.length)
  8.    "#{truncate episode.title, max_title_length}#{url_and_hashtag_suffix}"
  9. end

And then, with a long title, it failed. Can you guess why?

The answer is because I apparently went to sleep for three years, and when I woke up, the world had composed hundreds of billions of tweets. Many of them include URLs, which means the t.co length has crept up to 22 characters – 23 for SSL URLs – rising at about 1 character a year. Yes, if your tweet has a link in it, you now have to be 2.5% more concise in describing the link (that’s 3/(140 – 19)).

Thankfully, there’s an API for this:

So your code could periodically crawl the config API and aggressively-cache the result. Or alternatively, have your build script download it to your code base at compile-time, if it hasn’t seen an update for a while.

I haven’t checked in detail, but there are probably some open-source Twitter packages (gems, NPM modules etc) that include this config data and keep it up to date.

Note this also affects images and video – the above config URL also provides the length of a media item.

A simple way to speed up Vim Ctrl-P plugin: Delegate to Ag

Ctrl-p is “Intellisense for Vim”, allowing you to quickly jump to a file by searching for a few letters or even fancy camel-case type searches. (e.g. find article_editor.rb by searching for “ae”).

However, doing all this requires it to maintain a search index, aka cache, to be maintained. That can be very frustrating with a big project as it takes 5-10 seconds to update, which is not a good thing when you’re desperately trying to jump around files. This delay would be fine if Ctrl-P worked in the background, but due to Vim limitations, it can’t, so you have to frequently run it on the command-line and wait for the update.

Or do you?

No you don’t. Here is a trick that lets you never wait for ctrl-p again! Just add this to your vimrc:

let g:ctrlp_user_command = 'ag %s -i --nocolor --nogroup --hidden
      \ --ignore .git
      \ --ignore .svn
      \ --ignore .hg
      \ --ignore .DS_Store
      \ --ignore "**/*.pyc"
      \ -g ""'

It’s taken straight from here. The cool thing about this trick is it doesn’t just speed up indexing, it completely removes the need for it. This is achieved by relying on the command-line tool Ag, aka Silver Searcher. It’s a brilliant grep replacement I would recommend to anyone, being exponentially faster than grep (as in, you can happily search a whole hard drive in real time).

I’ve used Ag for years but never realised it could be piped into Ctrl-P!

That page also includes some matching optimisation, but seriously the Ag trick was all I needed. Searching is now completely instantaneous and I never need to worry about the index going stale again.

The update has been pushed to my dotfiles.

Speeding up Rails tests with Spring

I found Rails tests were running slow, these things helped.

Instrumenting application.rb

First, I added some logging to application.rb

  1. def logg(m)
  2.   puts "#{DateTime.now} #{m}"
  3. end
  5. logg 'require boot'
  6. require File.expand_path('../boot', <strong>FILE</strong>)
  7. logg 'done'</code>

The main bottleneck was bundling (Bundler.require(:default, Rails.env)), which took around 12 seconds on OSX.

Goodbye Bundle?

10+ seconds delay for running a test is unacceptable, leading me to contemplate ditching Bundler. However, that’s a lot of manual overhead, requiring us to manually import stuff, and we’re using Ruby here because it’s supposed to cut out tedium like that.

Spring forward

The real answer is to avoid running application.rb every time using the (defaultly-installed) Spring. Spring should keep a master process around after the app has booted, so we don’t need to run application.rb every time. I thought Spring was already running, but with this kind of delay, I had to check it.

Opened a new terminal, started tailing /tmp/spring.log, and then made spring start using it in the main Ruby test running terminal:

  1. spring stop
  2. export SPRING_LOG=/tmp/spring.log
  3. tail -f /tmp/spring.log

This is perfect for diagnosis: I have the aforementioned application.rb logs in the main terminal and the other window shows me what Spring is up to.

Ran a test:

  1. truby test/models/user_test -n /username/ # truby is my alias for RAILS_ENV=test bundle exec ruby ...

And yes, I saw application.rb running through all the motions, 10 second bundle, etc. And no spring action.

Ran it again.

Same thing. We’re booting the whole app for every one of these tiny tests!

The thing is Spring only works for a select few commands – any rake command and just a few Rails commands (rails console, rails generate, rails runner). This caught me out as I was trying to run rails server and Spring again wasn’t running. I guess the thinking is you shouldn’t need to restart web server often in dev anyway – stuff is already auto-reloaded according to default config.

So the answer was to run tests through Spring. I’d previously stopped doing that due to difficulty running a single test method, but now it’s easy.

  1. rake test test/models/user_test /username/

In the above case, the second argument is an optional test method or regular expression. Unlike the direct invocation, no “-n” is needed if a regex is provided.

And, problem solved. I see Spring is cloning the booted process and the test runs in a fraction of a second.

Chrome’s new user switcher is a major design regression

I rely on profile switching completely for day-to-day use. I have my personal Google account, Google apps account, a sandbox account for testing in clean-slate, a company account, support account, account for Google Play, account with read-only access to some servers which can be authorised to analytics tools, etc. Consultants often have one or more accounts per client.

The old “switching between different browsers” doesn’t cut it, Chrome knew it when they introduced this feature, and with the latest update, it’s become barely usable.

The new user navigation switcher:

  • Shows text labels instead of the instantly recognisable, customisable, graphical ones. Welcome to 1990, have a nice day.
  • Requires an extra click to switch (“Switch profile”) after hitting on the current profile name.
  • Takes several seconds to open the switcher menu in its own dialog.
  • After clicking twice to open the switcher and waiting around for it to paint, it only shows 8 profiles (what?!), only the first 8 in alphabetical order, and won’t scroll for more. Turns out you have to manually expand it to see all profiles! Hidden core features ftw!
  • The big compromise was to show the old menu if you click with the second mouse button. Well guess what, many of us don’t have a second mouse button!
  • The old menu can also be launched with … a 2-finger tap on the trackpad. Yes, the only way to access this functionality is with an awkward 2-finger tap. It works 35% of the time for me, I’m trying to get some practice in so my average can hopefully get up to 50% (dream big). Keep in mind, this 2-finger tap is *the only way* I can access some of my profiles (those beyond the first 8).

Bottom line: Chrome had a great feature for years, which no other browser supports to this day. The architects of this feature have dropped the ball completely, given no explanation for these counter-intuitive design decisions, and hopefully Firefox or Edge will be there to pick up the slack.

All of the above issues are articulated in the bug tracker in very much the same way as the new design has no-one explaining the benefits. It’s hard to see anything good about the new design.

It’s not a case of wanting to stick to the old thing for the sake of not learning a new interface. It’s a case of a serious design regression in every dimension, one that hasn’t seen any rational justification from the designers.

Sometimes you can just tell when the designer doesn’t use their own feature.

Time spent in mobile apps considered harmful

TechCrunch reports on time spent in mobile apps. Saying users spend X time on these kinds of apps sounds like the best way to gauge engagement, but caution advised. This is a useful figure for apps like Facebook, whose revenue is closely tied to how many eyeball-seconds are spent in the app. But as others have pointed out, it’s flawed.

Consider the following app categories:

  • E-Commerce – as the above tweet mentions, a staggering amount can be spent with a few taps. Uber’s whole magic is precisely how effectively it gets out of the way.
  • Notifiers – how about getting recommendations from FourSquare when you walk past it, or seeing when your friend is nearby. Apps can be valuable even if they are mostly about poking you when something happens.
  • Audio – Few people outside of stock photo models specifically sit down to listen to music and podcasts. Audio apps are fundamentally about multitasking, which means users will do some basic management, hit play, and get out of the way. Whether they leave the app in the foreground or background is fairly arbitrary, so as long as these metrics are measuring eyes and not ears, it’s skewed.