Sometimes, we prefer using Amazon EC2 directly for our Rails stack. No offense to Heroku but we need a more controlled environment; and no offense to EngineYard as they don’t support MongoDB on their environment as yet.
We were faced with several problems that we wanted to solve
- Control our environment without MongoDB hogging all the memory.
- Choose the right instance
- Choose the right fileSystem for the optimal performance.
Choice of EC2 instance is always an interesting one – when you have to shell out the money from your pocket, you neither want to overspend nor underutilize the instance. We found that using an m1.medium instance (4 GB ram) gave us enough lee-way to manage our MongoDB instances. If you plan on running more than just MongoDB (say a Rails app in addition to MongoDB and other services like some search indexing engine etc. I would recommend the m1.large at least)
Over the course of changing instances, tuning them up for performance, we realized these important pointers:
MongoDB runs well on ext4 systems and really badly on ext3
When we attached an EBS to the instance, you need to format the filesystem. By mistake or unfortunately we ran it with ext3 and had huge problems. When we upgraded to ext4 (and had to repair the database), it runs brilliantly. ext3 is slow in allocating files and working with large files; remember default pre-allocated files in MongoDB are 2GB !
After we tried these stunts we found that this is indeed well documented on the mongodb site!! Better late than never.
Freeing page caches
Very often, we see that MongoDB can “hog” a lot of the memory in your system. Actually, it does clean up but it holds onto the allocated memory until absolutely necessary to release it. So if you really want to free memory on your system, you can periodically free the page caches. Since MongoDB does a 100ms data sync from its memory mapped files to disk, its save to say that all data ‘should’ be synced. (I would recommend that the following commands are executed carefully.)
Note: We were using Ubuntu and these are specific Ubuntu commands.
Suppose we have a memory map on this EC2 m1.medium instance as follows:
# free total used free shared buffers cached Mem: 3840472 2599256 1241216 0 321628 1025976 -/+ buffers/cache: 1251652 2588820 Swap: 0 0 0
We can free the page caches with the following command.
# echo 1 > /proc/sys/vm/drop_caches
Now, if you issue the free command, you can see that a lot of the buffers are freed up.
# free total used free shared buffers cached Mem: 3840472 1335260 2505212 0 488 98620 -/+ buffers/cache: 1236152 2604320 Swap: 0 0 0
You can read more about freeing page cache, inodes and dentries here.
Add Swap space!
By default, EC2 instance do not have any swap space when the instances are setup. Allocating them even if you have enough memory is immensely helpful. When MongoDB does really go bonkers (to be fair, when it is on full load and needs all the memory it can get hold of) the swap space is convenient to ensure that the linux kernels does kill MongoDB. Adding a swap space on EC2 is trivial and found this good reference.
dd if=/dev/zero of=/swapfile bs=1M count=1024 mkswap /swapfile swapon /swapfile # Add to /etc/fstab to mount on boot /swapfile swap swap defaults 0 0
Needless to say, the basic premises of setting up MongoDB should not be forgotten:
- Ensure you use replica sets as early as possible in your environment.
- Ensure you have enough space to run a database repair.
- Ensure you have indexed fields for faster queries
- Ensure Journaling is always enabled.
Last but not the least, remember to read this article about EC2 and MongoDB.
Thanks for the tips. At MongoDirector (www.mongodirector.com) we automate the entire process of deploying and managing Mongo replica sets and shards on Amazon EC2 using a simple two step wizard. You can pick the number of replicas and shards and the regions in which you want to place them. Provisioned IOPS and RAID can be used for optimal performance. We also use LVM snapshots for backup so that your backups take the same amount of time irrespective of the size of data.