Amazon EC2 and MongoDB configuration for great performance

Sometimes, we prefer using Amazon EC2 directly for our Rails stack. No offense  to Heroku but we need a more controlled environment; and no offense to EngineYard as they don’t support MongoDB on their environment as yet.

We were faced with several problems that we wanted to solve

  • Control our environment without MongoDB hogging all the memory.
  • Choose the right instance
  • Choose the right fileSystem for the optimal performance.

Choice of EC2 instance is always an interesting one – when you have to shell out the money from your pocket, you neither want to overspend nor underutilize the instance. We found that using an m1.medium instance (4 GB ram) gave us enough lee-way to manage our MongoDB instances. If you plan on running more than just MongoDB (say a Rails app in addition to MongoDB and other services like some search indexing engine etc. I would recommend the m1.large at least)

Over the course of changing instances, tuning them up for performance, we realized these important pointers:

MongoDB runs well on ext4 systems and really badly on ext3

When we attached an EBS to the instance, you need to format the filesystem. By mistake or unfortunately we ran it with ext3 and had huge problems. When we upgraded to ext4 (and had to repair the database), it runs brilliantly.  ext3 is slow in allocating files and working with large files; remember default pre-allocated files in MongoDB are 2GB !

After we tried these stunts we found that this is indeed well documented on the mongodb site!! Better late than never.

Freeing page caches

Very often, we see that MongoDB can “hog” a lot of the memory in your system. Actually, it does clean up but it holds onto the allocated memory until absolutely necessary to release it. So if you really want to free memory on your system, you can periodically free the page caches. Since MongoDB does a 100ms data sync from its memory mapped files to disk, its save to say that all data ‘should’ be synced. (I would recommend that the following commands are executed carefully.)

Note: We were using Ubuntu and these are specific Ubuntu commands.

Suppose we have a memory map on this EC2 m1.medium instance as follows:

# free
             total     used      free  shared   buffers    cached
Mem:       3840472  2599256   1241216       0    321628   1025976
-/+ buffers/cache:  1251652   2588820
Swap:            0        0         0

We can free the page caches with the following command.

# echo 1 > /proc/sys/vm/drop_caches

Now, if you issue the free command, you can see that a lot of the buffers are freed up.

# free
             total     used      free  shared   buffers     cached
Mem:       3840472  1335260   2505212       0       488      98620
-/+ buffers/cache:  1236152   2604320
Swap:            0        0         0

You can read more about freeing page cache, inodes and dentries here.

Add Swap space!

By default, EC2 instance do not have any swap space when the instances are setup. Allocating them even if you have enough memory is immensely helpful. When MongoDB does really go bonkers (to be fair, when it is on full load and needs all the memory it can get hold of) the swap space is convenient to ensure that the linux kernels does kill MongoDB. Adding a swap space on EC2 is trivial and found this good reference.

dd if=/dev/zero of=/swapfile bs=1M count=1024
mkswap /swapfile
swapon /swapfile

# Add to /etc/fstab to mount on boot
/swapfile swap swap defaults 0 0

Needless to say, the basic premises of setting up MongoDB should not be forgotten:

  • Ensure you use replica sets as early as possible in your environment.
  • Ensure you have enough space to run a database repair.
  • Ensure you have indexed fields for faster queries
  • Ensure Journaling is always enabled.

Last but not the least, remember to read this article about EC2 and MongoDB.

4 thoughts on “Amazon EC2 and MongoDB configuration for great performance

  1. Pingback: techmemories
  2. Thanks for the tips. At MongoDirector (www.mongodirector.com) we automate the entire process of deploying and managing Mongo replica sets and shards on Amazon EC2 using a simple two step wizard. You can pick the number of replicas and shards and the regions in which you want to place them. Provisioned IOPS and RAID can be used for optimal performance. We also use LVM snapshots for backup so that your backups take the same amount of time irrespective of the size of data.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.