Moving from backgrounDrb to DelayedJob

In my earlier post regarding DelayedJob setup I mentioned how to setup DelayedJob and create news tasks in your existing code. This post provides details of how to move from backgrounDrb to delayed_job. First of all its important to know why ?

– BackgroundDrb takes in a lot of memory. It spawns worker at start-up. You can use MiddleMan for dynamically spawing backgrounDrb tasks but in any case, it slows thing down a little. I have 4 workers and overall the parent process consumes over 120MB — which is huge considering my linode.

– Monitoring these jobs is a little bit of a pain. Moreover, running in development / production mode requires a Drb server which adds more memory overhead.

As we speak, github has introduced Resque, which is inspired from DelayedJob but I plan to currently continue with DelayedJob because I dont use Redis (yet). The blog post for Resque has a LOT of details about issues with current background job schedulers. Worth reading!

OK – so you’re now convinced we should use DelayedJob instead of backgrounDrb but have a lot of tasks already configured. These are the steps to follow:

1. Convert your backgrounDrb workers to standard ruby classes:

class BulkUploadWorker < BackgrounDRb::MetaWorker
  set_worker_name :bulk_upload_worker

  def upload(args)
  end

class BulkUploadWorker

  def perform
  end

2. If you earlier used arguments to be passed to the backgrounDrb jobs, you need to tweak the code a little.

Suppose I have an upload method which takes ‘arg’ as the Hash parameter, it would be invoked in the controller for backgrounDrb like this:

MiddleMan.worker(:bulk_upload_worker).async_upload(:arg => {
  'correction' => correction, 'file' => in_file, 'user' => current_user.id} )

Simple change to DelayedJob

Delayed::Job.enqueue BulkUploadWorker.new(correction, infile, current_user.id)

And change the worker to have a perform method (which is the one which gets called on the job):

BulkUploadTask = Struct.new(:correction, :infile, :user_id) do
   def perform
     file = File.open(infile)
     user = User.find(user_id)
     ...

If you look closely at the code above, even for an experienced Ruby coder – its no piece of cake. Now, I tried the original approch that was on github of

class BulkUploadWorker < Struct.new (:correction, :infile, :user)

but this gives me a type mis-match error. After some searching on the net.. I found the answer to this, quite understandably from one of the ruby greats JEG2 Here James clearly explains how a Struct.new returns a Class and accepts a block of code for all the methods. Note the use of Symbol :infile in the declaration but data member infile in the perform method.

Since my file was in lib/workers/bulk_upload_worker.rb, we need to explicitly require this file for DelayedJob. At this in the config/initializers/delayed_job.rb. Now, before I can get down to brass tags and incorporate it, I really need to know if this works . First ensure that the task works — directly from console:

RAILS_ENV=production ./script/console
>> task - BulkUploadWorker.new(false, 'test_file', 3)
>> task.perform

Once the task performs as per expectation, start up delayed_job server and test it out from your Web App. If there are errors or exception, delayed_job stores this in the database. So, its a good idea for command-line users with access to the server, to keep an eye out for errors and exceptions in the database.

Enjoy!

4 thoughts on “Moving from backgrounDrb to DelayedJob”

gautamrege says:

November 11, 2009 at 10:09

Weird error I get when trying to work with ferret via DelayedJob:

closed stream
/usr/lib/ruby/1.8/drb/drb.rb:961:in `select’
/usr/lib/ruby/1.8/drb/drb.rb:961:in `alive?’
/usr/lib/ruby/1.8/drb/drb.rb:1211:in `alive?’
/usr/lib/ruby/1.8/drb/drb.rb:1168:in `open’
/usr/lib/ruby/1.8/drb/drb.rb:1166:in `each’
/usr/lib/ruby/1.8/drb/drb.rb:1166:in `open’
/usr/lib/ruby/1.8/drb/drb.rb:1163:in `synchronize’
/usr/lib/ruby/1.8/drb/drb.rb:1163:in `open’
/usr/lib/ruby/1.8/drb/drb.rb:1092:in `method_missing’
/usr/lib/ruby/1.8/drb/drb.rb:1110:in `with_friend’
/usr/lib/ruby/1.8/drb/drb.rb:1091:in `method_missing’
/usr/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.3/lib/remote_index.rb:31:in `<<'
/usr/lib/ruby/gems/1.8/gems/acts_as_ferret-0.4.3/lib/instance_methods.rb:90:in `ferret_update'
…

The task works fine when I run it off console, so it has got to be something to do with environment loading ! Have posted on stackoverflow ( http://stackoverflow.com/questions/1679043/delayedjob-with-actsasferret-in-production-mode ) but no answers yet! 😦

Pingback: Migrating an existing deployment to Heroku « Gautamrege's Blog
Ashish says:

March 10, 2010 at 02:54

I was wondering, whats the simplest way to put a delay in each task in a job. For example, lets say im fetching information for 5 entries in a file. The file is the job, and each entry is the task. How can i put a delay after the first entry is parsed and committed to the db?

Thanks!

btw, great tutorial!

Gautam Rege says:

March 10, 2010 at 12:06

@Ashish Using the Shopify’s extract of delayed job: http://github.com/theneubeck/delayed_job you have a world of different possibilities. If your file is parsed in the proper way, you can so any of the following:

– Schedule a job later:
Delayed::Job.enqueue CustomJob.new(‘job name’, ),0, 5.minutes.from_now

– Schedule a job at an interval:
Delayed::Job.schedule CustomJob.new(‘job name’, ), :every => 3.hours

– Schedule a job at a particular time
Delayed::Job.schedule CustomJob.new(‘job name’, ), :every => :first_of_month
Delayed::Job.schedule CustomJob.new(‘job name’, ), :every => :last_of_month

Does this help you out?