Internationalizing Bike Index

by Jake Romer


Topics: Tech

Hallo, vrienden!

Great news for our friends in the Netherlands – Bike Index is now available in Dutch 🎉

As part of its ongoing mission to eliminate bicycle theft worldwide, Bike Index has partnered with BikeFair, a Dutch bike marketplace dedicated to bringing safety and transparency to second-hand bike sales. Making Bike Index accessible to Dutch users has been a critical component of that partnership.

Together with our recent integration with Dutch stolen goods registries and, the internationalization project will enable Dutch bicyclists to register and search for their bikes using Bike Index, and to use Bike Index's new Promoted Alerts service, which uses targeted Facebook ads to more effectively recover lost and stolen bikes.

Bike Index in Dutch

Technical notes

As a resource guide for other open-source projects that may need to undertake a similar project (or contributors to Bike Index – we are open-source!), what follows is a brief outline of the considerations involved in internationalizing a Rails app, our particular constraints and desiderata, and the decisions we made in our implementation.

Unless your Rails app has been internationalized since its inception, internationalizing it minimally entails three broad efforts:

  1. Adding the ability to detect and set the locale for a given web request.
  2. If using the default Rails i18n framework, externalizing user-facing strings, moving them from views (mainly but not exclusively) to YAML.
  3. Translating your now-externalized strings to other languages.

For Bike Index, we did some research into the approaches taken by other internationalized open-source Rails projects – in particular, Discourse and GitLab. This work was useful in developing a mental model of the work to be done, although naturally we deviated with them where different needs or constraints demanded it.

Locale detection

There are a number of ways to detect a user's locale:

  1. explicitly from a locale query param (settable via a UI element),
  2. explicitly from a database value tied to the user's account (settable via a user preferences UI),
  3. explicitly from the ACCEPT_LANGUAGE header set on a request (settable via the user's browser preferences), and
  4. inferring a locale from the user's geocoded location.

To minimize complexity, we implemented only (1) through (3).

Translation management

Translation management is a "buy vs. build" decision point. The central questions to engage with here are

  1. Do we want developers to be the gatekeepers to updating translations? (In our case, no.)
  2. Do we want to accept translation contributions from users via the site UI? (In our case, a nice-to-have but infeasible for a v1.)
  3. Are we able to invest the resources into building our own translation management solution? (In our case, probably not.)

That left us pricing a variety of translation management services we'd seen used elsewhere and researching their feature sets, including Transifex, LingoHub, and Phrase.

All involved committing to a monthly subscription that ranged from $19 to $180 per month, in addition to the cost of translation, which we estimated would cost $8,000-$10,000 for an initial Dutch translation.

Some more digging surfaced, which is lightweight, focused on Rails (and Laravel) projects, and pushed all the right buttons for us:

  1. It's free for open-source projects
  2. It automatically integrates translations by Google Translate (imperfect but a cost-effective 80-90% solution), further reducing the costs involved

As a non-profit, we're relatively price-sensitive and don't want to use funds inefficiently, so the potential savings gave a big leg up in our deliberations.

Its most significant feature-gap relative to its alternatives – automated GitHub PRs to sync translations – could be implemented with some shell script integrated into our build pipeline, so we had a clear winner.

String externalization

The key decision for this stage is what format to use for translation files, the choices being YAML (the Rails default) and GetText (broadly popular beyond the Rails ecosystem).

GetText has several advantages over the Rails default, especially for large projects – the most compelling arguably being that strings don't need to be externalized from templates to a translation file. Instead, the source string lives in the template but is merely wrapped in a special method.

But, as is often the case in a Rails context, the defaults are collectively better optimized on the needs of a moderately-scaled project like Bike Index than the alternatives, even if those alternatives are in one sense or another individually better.

There is pre-existing tooling that both mitigate the disadvantages of the Rails default i18n framework and amplify its benefits, so we chose to not stray too far from Rails conventions in order to leverage as much open-source prior art as possible. Additionally, the YAML approach allows non-developers (marketing?) to edit source copy without diving into the source code.

String externalization is by far the most time-consuming and labor-intensive part of a translation project.

We automated as much as possible using a variety of code-gen and text wrangling tools:

  • i18n-js: Lightweight generator of client-side translation file(s)
  • rails-i18n: Generated translations of model attributes, etc
  • i18n-country-translations: Pre-fab translations of country names
  • money: Currency localization
  • haml-i18n-extractor: Externalize strings from Haml to YAML
  • vim-i18n: Externalize strings from ERB to YAML
  • i18n-tasks: Rake tasks for maintaining translation files (normalizing translation files, detecting missing keys, etc.)

Pretty Good Practices

Some learnings emerged over the course of scanning through and extracting strings from ~15,000 lines of template, controller, and React code. Check out our internationalization docs if you'd like to read more about them!

An expanded version of this post - and others by Jake - can be found on his blog.