Departure.rb

Databases are usually the lifeblood of applications and of dynamic web sites.
They manage the information that applications depend on, and a large portion of
application development revolves around the care and feeding of the data that
goes into a database or comes out of the database, as well as the structure of
the database.

When building applications with Rails, the common paradigm is to use migrations
via ActiveRecord. When prototyping an application, or when working on
applications that do not have particularly large databases, this works great,
but the pattern may start to suffer when the database starts to get large and when
developers want to make changes to the database schema.

Various database engines have different capabilities for making structural changes
without locking a table, so one has to approach the issue of migrations in the
context of the database engine that one is using. By understanding these semantics,
it is possible to do development without creating long running table locks, even
on large databases, but doing so requires fine knowledge of which operations are
blocking, and which are not, and it requires avoidance of and circumnavigation around
operations which are unavoidably blocking.

For example, one can, indeed of renaming a column or changing the data type on a column:

  • add a new column
  • setup a trigger to copy data inserted into the old column to the new one
  • run a job to copy any old values from the old column to the new one
  • change your code to use the new column instead of the old
  • remove the data copy trigger
  • delete the values from the old column

You will have succeeded in functionally making a column structural change
without locking the table while it was performed. The downside, though, should be
obvious — it is a lot of steps to perform.

Percona wrote a tool that helps with this situation. It is called pt-online-schema-change.
It is a command line tool that, given the right arguments, will run table alterations
in a safe and non-blocking way. The problem with it, though, is that the tool itself can
be complex to run,
and the use of it is well outside the normal Rails development approach of allowing
developers to control how they want the database to look and operate through the use of
the familiar ActiveRecord migrations DSL.

In many organizations, use of a tool like pt-online-schema-change may even require
SRE or devops staff involvement, which adds latency and time expense to performing
migrations. It can slow your development cycles.

Departure.rb seeks to simplify this for MySQL and MySQL family database engines. It
will work with anything that pt-online-schema-change works on. As examples,
a self hosted MySQL database, an instance of Percona Server for MySQL, or an AWS Aurora instance will all work with it.

Background

Departure, which saw version 6.2.0 released on June 20th, 2020 is an adapter for Rails that, when included into a project, causes ActiveRecord
migrations to automatically utilize the Percona Migrator, aka pt-online-schema-change,
aka pt-osc, to run the migrations.

The pt-osc tool runs migrations by applying them to a copy of the original
table, with a trigger on the original table so that any changes to table data
that occur while the migration is taking place are mirrored in the copied
table. Once the original table is fully migrated, the tool does an atomic swap
of the now-migrated copy of the original table with the original, and then it cleans
everything up for you.

The end result is that your long, large, slow migration can run on the production database
without requiring downtime on the database or locking of tables that may be service-impacting for your customers.

It is also worth noting that pt-osc, and thus Departure, tries to do this in a safe
way. If there is an error during the migration, it will clean up after itself, leaving
the original table unchanged.

Usage

If your Gemfile includes the Departure gem, it will be used for all migrations.

”’
gem “departure”, “~> 6.2”
”’

There are no special considerations when writing migrations. Departure is
transparent, so migrations should be written without any special consideration
for how they will be ran.

class ChangeLowercaseEmailLimitAndCollation < ActiveRecord::Migration[6.2]
  def up
    change_column :credentials, :lowercase_email, :string, limit: 255, collation: "utf8mb4_bin"
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

To run the migration, the process is the same as with any normal, small app with
small migrations:

rake db:migrate #RAILS_ENV-....

All of the ALTER TABLE statements that ActiveRecord would run on your behalf will
be translated into pt-online-schema-change invocations, and ran for you that way,
with additional output from the execution of the migration to allow one to monitor
and validate that it is running as expected.

Customization

The pt-online-schema-change tool has a wide variety of options that can influence
exactly how it operates. Given this, one will often want to apply a set of configuration
options that are tuned to the specific requirements of your application or database.

Departure supports this through a couple different mechanisms.

The most direct is through the use of an environment variable when running the migrations:

PERCONA_ARGS='--chunk-time=1 --critical-load Threads_running=55' bundle exec rake db:migrate:up VERSION=xxx

The contents of the PERCONA_ARGS variable will be used as additional arguments to pt-online-schema-change.

This example tells pt-osc ia few things:

1) --chunk-time — to use no more than one second per data copy operation, and to dynamically adjust the number of rows that it copies in each chunk to fit that limit.

2) --critical-load Threads_running=55 — this examines the value of MySQL’s SHOW GLOBAL STATUS after every chunk, and aborts if the provided value is exceeded. So, in this case, if MySQL has more than 55 running threads, the tool will decide that it is putting too high a load on the server, and it will abort the operation. There is no one value for this parameter that will be right for all databases and all situations. In some cases, the database and the application may require a lower number, and in some cases it is fine to go with a much higher number. The pt-osc tool defaults to a value of 50 for this threshold, but you will have to tune it to your own situation if that default seems to deliver poor results.

The other mechanism for controlling pt-osc behavior is to provide the pt-osc options via DepartureJ’s global_percona_args
support during its configuration.

Departure.configure do |config|
  config.global_percona_args = '--chunk-time=1 --critical-load Threads_running=55'
end

This will apply those arguments to every pt-osc invocation, without requiring that the
rake command be passed an environment variable.

A useful technique to take advantage of this for a project is to write a simple gem that
is included into your project’s dependencies that encapsulates the defaults for your project.

The gem itself can be very simple. This is a variation of a real world gem that I wrote in the past
for a project:

require "departure"

module DepartureDefaults

  def self.pt_plugin_path
    File.join(__dir__,
              "departure_defaults",
              "pt-online-schema-change-fast-rebuild-constraints.pl")
  end

  def self.global_percona_args
    %W( --charset=utf8mb4
        --recurse=0
        --recursion-method=none
        --max-load\ Threads_running=80
        --critical-load\ Threads_running=120
        --chunk-time=1.0
        --plugin\ #{pt_plugin_path}
        --sleep\ 0.1
        --alter-foreign-keys-method\ rebuild_constraints).join(" ")
  end

  Departure.configure {}
  Departure.configuration.global_percona_args = global_percona_args

end

The --max-load, --critical-load, --chunk-time and --sleep parameters in this
configuration example all serve to control how the tool manages how heavily it uses the database. Being too aggressive with data transfer can DDOS your database if you let the tool drive your MySQL installation beyond it’s own capacity.

You will want to do some testing to figure out what your configuration can handle. The
pt-online-schema-change tool has a huge number of knobs that can tweak its behavior
and resource usage that are well beyond the scope of this article. You can read more about
all of the options here:

https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html

It is also worth noting that this gem utilizes this wonderful little utility to reduce
the time that the migrations take:

https://github.com/winebarrel/pt-online-schema-change-fast-rebuild-constraints

It is a simple little Perl script that temporarily disables the checking of
foreign key constraints. This improves the speed of the migration, as once the
data is copied, all of the foreign key constraints can be checked at the same time.

Normally constraints are checked as data is inserted, but the repeated checking of constraints in this way slows down the overall data copy operation. Because pt-osc and Departure will abort any migration that fails at any point, this utility is generally safe to use, though, as a failure at the end of the operation will still result in no change to the original table. The only downside to the use of this utility is that if there is a constraint violation hiding in your data, and your migration is a very long, very slow one, you won’t find out about the failure until the end of it.

Selective Use of Departure

It may be that the vast majority of your tables and your migrations are already
safe, and you want the discretion to choose when to push your migrations through
pt-online-schema-change and when not to. Departure supports this mindset as well.
There is a configuration option, enabled_by_default, that can be unset in your global
configuration block:

Departure.configure do |config|
  config.enabled_by_default = false
end

If this is done, then only migrations that are specifically flagged with a
uses_departure! line will go through Departure. Everything else will go through
the normal ActiveRecord process.

class ChangeAReallyLongSlowUglyThing < ActiveRecord::Migration[5.2]
  uses_departure!

  def up
    # Here's your long, slow, ugly migration.
  end

  # Other stuff...
end

Final Thoughts

As with any major change to your application, migrations, whether they are ran directly
through ActiveRecord or through Departure, should always be tested first. However, the
use of Departure can put the power to manage Migrations back into the hands of your
developers, and can take your DevOps/SRE people out of the critical path of operations.
It enables developers to safely plan and execute critical migrations without worries that
the application’s tables will be locked, causing an unwanted outage.

The project is very open to pull requests, so take a look, try it in your project, and
if you see something that can make it better, by all means let us know.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.