Migrating user accounts from Google OpenID to Google OAuth to Google Plus

Background

Over here, Ade has asked me to permalink a comment I made about moving Player FM accounts from Google Open ID to OAuth. The reason I did it was because Android sign-in is really based on OAuth, but what I didn’t know was Google at the time was preparing to launch “Sign in with Google Plus”, also based on OAuth. Bottom line: Google Open ID, afaict, is going the way of the dodo, so any services using it probably want to migrate their users to G+. (Googlers may please correct me if I’m wrong about Open ID’s demise.)

There were many permutations to be considered for this kind of migration, each with trade-offs to developer complexity and user experience. What follows was the optimum balance for me after a repetition of thought experiments, in between wishing I had a pony (ie that I’d opted for OAuth in the first place, the only reason I didn’t was availability of an Open ID Rails plugin). This is all web-based as we (thankfully) hadn’t launched on Android yet.

The problem

The first concern here is largely about user perceptions. To us developers, Google OAuth and Google Open ID are distinct login mechanisms, as similar to each other as they are to Facebook or Twitter. But to the user, they’re the same thing – Googles all the way down. So you can’t present the user with separate login buttons for Google OAuth and Google Open ID…you only get to present one Big G button.

The other concern is that sites who present “Existing users – log in here” versus “New users – sign up here” buttons … are doing it wrong. A major benefit of third-party sign-in is you can just present a “Connect with X” button and the user doesn’t have to care or remember if they previously connected with X or not. Don’t make me think!

Put concern A together with concern B and Houston, we have a problem. You present that one big G button with Google OAuth and what happens if the user is unrecognised? Is this a new user or someone who had previously logged in using Open ID. (It’s fine if the user is recognised, that means they’re one of the post-OAuth-era people.)

The solution

The solution depends if you’re willing to ask for email permissions on the new oAuth flow.

If you are willing to ask for email, that will make it easy to link the two accounts, because Open ID already relies on email, so you have their email. You can just switch right now to oAuth and once the user authenticates, link the account with the account having the same o8 ID. (Note: this scenario is purely speculative and not based on my experience.)

Since I chose not to ask for email, I had to do the uncool thing and temporarily divided Login from Signup.

In the Login area, the app prompted users for their email or login, and immediately made an XHR call to detect whether that account is using o8 or oAuth, then showed the corresponding button (the button is just a Google button, looks the same either way but the link will be different for o8 vs oAuth). (In addition, the Twitter and classic login form were shown.)

For people who logged in with Open ID, I built an !IMPORTANT! big red notification when the user logged in via Open ID, telling them we’ve updated Google login procedure, and when they click to set it up, taking them through the oAuth flow with a special callback for the migration. At this point, the server recognises the two accounts are linked (they’d already logged in with Open ID, now they’ve just logged in with OAuth), so we can save the user’s OAuth credentials. This user now has two third-party accounts – Google Open ID and Google OAuth. Just as they might also have a Facebook account and a Twitter account.

The Signup area of course only contained an OAuth button (as well as Twitter, which was exactly the same as Twitter in the login area, and classic signup form).

I published advance notice on the blog I would be shutting down the old Google IDs, kept the migration alive for two months, and then deleted all the Open ID accounts at that time. Anyone who didn’t log in at the time lost their accounts, but of course a few people mailed me about it (mainly when we launched the Android app and they tried to log in again), so I helped them migrate manually.

I did this for a couple months before deprecating o8 and returning to the nicer single Google button setup. And just manually merged the accounts when a few users asked me why the Google button is not getting them back to their old account.?

Epilogue

It was well worth the pain, as the vast majority of Android users now choose to log in with Google, even though we also support classic login and guest mode. The G+ API was a big bonus to come out of it. I’ve done some experiments on the social side and expect to do much more with G+ accounts in the future, to help people discover new shows to listen to.

Defer and Recur with Rails, Redis, and Resque

I’ve put off some scaling related issues about as long as possible, and am now proceeding to introduce a deferred-job stack. I’ll explain what I’ve learned so far, and with the caveat: this isn’t in production yet. I’m still learning.

What it’s all about

Tools like Resque let you perform work asynchronously. That way, you can turn requests around quickly, so the user gets something back immediately, even if it’s just “Thanks, we got your request”, which is nicer than the user waiting around 5 minutes, and ensures your server doesn’t lock up in the process. Typical example being sending an email – you don’t want the user’s browser to wait while your server connects elsewhere and fires off the email. Other examples would be fetching a user’s profile or avatar after they provide their social profile info; or generating a report they asked for.

So you set up an async job and respond telling the user their message is on the way. If you need to show the user the result of the delayed job, make the clien polls the server and render the result when it’s ready. More power XHR!

The simple way to do this

The simple way, which worked just fine for me for a long time and I’d recommend for anyone starting, is a simple daemon process. Basically:

  1. while true
  2.     if (check_database_for_condition)
  3.       do_something
  4.     sleep 10
  5.   end

The fancy way

The problem with the simple way is it can be hard to parallelise and monitor; you’ll end up reinventing the wheel. So to stand on the shoulders of giants, go install Redis, Resque, and Resque-Scheduler. I’ll explain each.

Redis

Redis, as you probably know, is a NOSQL database. It’s been described as a “data structure server” as it stores lists, trees, and hashes; and assuming Knuth is your homeboy, that’s a mighty fine concept. And it’s super-fast because everything is kept in memory, with (depending on config) frequent persistence to disk for durability.

Resque

Resque is no sneezing matter either, being a tool made and used by GitHub, no less.

Resque uses Redis to store the actual jobs. It’s worth explaining the main components of Resque, because I’ve found they’re often not defined very clearly and if you don’t understand this, everything else will trip you up.

Job. A job is a task you want to perform. For example Job 1234 might be “Send welcome email to [email protected]”. In Resque, a job is defined as a simple class having a “perform” method, which is what does the work [1].

Queue. Jobs live in queues. There’s a one-liner config item in the Job class to say which queue it belongs to. In a simple app, you could just push all jobs to a single queue, whereas in a bigger app, you might want separate queues for each job type. e.g. you’d end up with separate queues for “Send welcome email”, “Fetch user’s avatar”, and “Generate Report”. The main advantage of separate queues is you can give certain queues priority. In addition to these queues, you also have a special “failed” queue. Tasks that throw exceptions are moved to “failed”; otherwise the task disappears.

Worker. A worker is a process that runs the jobs. So a worker polls the queues, picks the oldest jobs off them, and runs them. You start workers via Resque’s Rake task, and in doing so, you tell it which queues to run. There’s a wildcard option to run all queues, but for fine-grained optimisations, you could set up more workers to run higher-priority queues and so on.

An important note about the environment. Rails’ environment can take a long time to start, e.g. 30 seconds. You clearly don’t want a 30-second delay just to send an email. So workers will fork themselves before starting the job. This way, each job gets a fresh environment to run off, but you don’t have the overhead of starting up each time. (This is the same principle as Unicorn’s management of Rails’ servers.) So starting the worker does incur the initial Rails startup overhead, but starting each job doesn’t. In practice, jobs can begin in a fraction of a second. You can further optimise this by making a custom environment for the workers, e.g. don’t use all of Rails, but just use ActiveRecord, and so on. But it’s probably not worth the effort initially as the fork() mechanism gets you 80% there.

Resque-Scheduler

For many people, Resque alone will fit the bill. But certain situations also call for an explicit delay, e.g. “send this email reminder in 5 days”; or repeat a task, e.g. “generate a fresh report at 8am each day”. That’s where Resque-Scheduler comes in [2].

Resque-Scheduler was originally part of Resque, so it basically extends the Resque API. The “scheduling”, i.e. repeated tasks, are represented as a Cronjob-like hash structure and can be conveniently represented in a YML file.

Delayed jobs are created by your application code. It’s basically the same call as when you add the job directly to Resque, but you need to specify an additional delay or time argument.

The cool thing is jobs are persisted into Redis, so they will survive if the system — or any components (Redis/Resque/Resque-Scheduler) — goes down. I was confused at first as I thought they were added to some special Resque queue. But no, they are actually in the Redis database. I found this by entering keys * into Redis’s command-line tool (redis-cli), which yielded some structures including “resque:delayed:1372936216″. When I then entered dump resque:delayed:1372936216, I got back a data structure which was basically my job spec, ie. {class: 'FeedHandler', arg: ['http://example.com'].

So Resque-Scheduler basically wakes up every second or so, and does two things: (a) polls Redis to see if any delayed jobs should now be executed; (b) inspects its “schedule” data structure to see if any repeated jobs should now be executed. If any jobs should now be executed, it pushes them to the appropriate Resque queue.

Notes

  1. Conceptually a job definition is little more than a function definition, rather than a full-blown class. But being a class is the more Rubyesque way to do it and also makes it easy to perform complex tasks as you can use attributes to hold intermediate results, since each job execution will instantiate a new job object.

  2. I evaluated other tools, e.g. Rufus and Clockwork, but what appeals about Resque-Scheduler is it persists delayed jobs and handles both one-off and repeated jobs.