After been working with MongoDB for some time and completing M101 and M102 classes(first for Developers and second for DBAs) from 10gen, I’ve decided to cover a topic of why you might consider using MongoDB in your application. In this post I’m also going to cover the reasons for not choosing MongoDB for your app(There are quite a few of those). Hopefully, after reading this article you will understand all pros and cons of building your application of top of MongoDB.
In the modern world of software development you no longer have to choose only among RDBMSes when starting a new project. A number of products, generally referred as NoSQL, were created to offer new approaches to the data persistence. Some of them offer near-linear horizontal scalability, some offer better read/write performance(than classical relational storage) and some are focused on a more convenient data representation(a more convenient for a certain data access pattern or business domain). MongoDB is one of such NoSQL storages which supports replication, sharding and document-oriented schema-less persistence.
Reasons to choose Mongo
Document oriented and schemaless. Unlike relational DBs, MongoDB stores all your data in collections of BSON documents and has no schema. Which in turn tremendously simplifies mapping between domain objects and DB. Things like embedded/nested objects and arrays inside your domain objects are transparently stored in DB. In this way MongoDB becomes a perfect choice for domains with polymorphic data and/or for rapid software development where you basically can’t afford to spend too much time doing schema design.
Horizontal Scalability and High Availability. This is what many people associate with Cloud Architecture. MongoDB allows to build a clustered topology with replication and sharding, where the former provides fail-over and eventually consistent read scaling and the latter facilitates scaling out writes(and reads as well).
Fast writes in the fire-and-forget mode. Which is quite useful for collecting various statistics where possible loss of some data is acceptable in favor of a shorter response time.
Comprehensive Querying and Aggregation Framework. With MongoDB you can query your collections with a powerful querying facility, which, by the way, takes advantage of suitable indexes if you have created any, and allows to query nested/embedded objects and arrays. For queries, which require things like MAX, AVG or GROUP BY from SQL, there is a comparatively new mechanism called Aggregation Framework, which allows to run some ad-hoc aggregation queries without need to write cumbersome Map-Reduce scripts.
It’s Free and OpenSource. Yeap, and besides it’s stable, has frequent releases(for example an up-coming release will add a support of full text search) as well as a nice documentation and fast growing vibrant community.
Comparatively intuitive architecture. Due to the fact that MongoDB has only a single master per replica things are definitely simpler comparing to peer-to-peer architectures where you can have concurrent writes and write conflicts.
Reasons not to choose Mongo
After I described major advantages of choosing/adopting MongoDB I would like to cover the other side of the coin and talk about reasons for not choosing MongoDB for your project.
No SQL = No Joins. It should be obvious that with NoSQL DB you won’t have the ability to use SQL. As a result under those, hopefully rare occasions, where you need to pick up related/referenced data from several collection you will have to do it manually and with no guarantees in terms of consistency. If you see yourself doing mission critical decisions inside your application where you will need data from multiple documents/collections, then you should think twice before using MongoDB.
No ACID transactions. After coming from SQL world you are going to be surprised how many things you are loosing when ACID transaction aren’t there anymore. When working with multiple documents(MongoDB guarantees atomic operations on a single document) you will have no automatic rollback, a possibility of inconsistent reads etc. Occasionally you may overcome these limitations by using two-phase commit, entity versions and in-app locks, but generally if you see yourself doing these things more than 1% of operations, you have probably chosen a wrong DB.
Can’t be used as an integration DB. It’s generally a bad idea to use NoSQL storage as an integration DB which can be accessed by several apps simultaneously. No schema and eventual consistency are going to play against you here.
Your indexes should fit into memory. MongoDB performs well only if your indexes fit into RAM and it’s ideal to have SSD hard drives on your prod servers. MongoDB is simply not optimized to work on HDD as many RDBMSes and thus you can get into troubles with certain usage scenarios where you would be just fine with RDBMS.
In this article I haven’t tried to get into nitty-gritty details of how MongoDB works, but instead covered only the essentials. If you are new to NoSQL world, I would suggest you reading NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, which provides a very good introduction into NoSQL world(e.g. it contains good explanations of what replication and sharding are etc.).