What is Apache Storm?
What is MongoDB?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Apache Storm?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
Used MongoDB as primary database. It holds trip data of NYC taxis for the year 2013. It is a huge dataset and it's primary feature is geo coordinates with pickup and drop off locations. Also used MongoDB's map reduce to process this large dataset for aggregation. This aggregated result was then used to show visualizations.
MongoDB fills our more traditional database needs. We knew we wanted Trello to be blisteringly fast. One of the coolest and most performance-obsessed teams we know is our next-door neighbor and sister company StackExchange. Talking to their dev lead David at lunch one day, I learned that even though they use SQL Server for data storage, they actually primarily store a lot of their data in a denormalized format for performance, and normalize only when they need to.
In addition to batch processing, we also wanted to achieve real-time data processing. For example, to improve the success rate of experiments, we needed to figure out experiment group allocations in real-time once the experiment configuration was pushed out to production. We used Storm to tail Kafka and compute aggregated metrics in real-time to provide crucial stats.
Nearly all of our backend storage is on MongoDB. This has also worked out pretty well. It's enabled us to scale up faster/easier than if we had rolled our own solution on top of PostgreSQL (which we were using previously). There have been a few roadbumps along the way, but the team at 10gen has been a big help with thing.
We are testing out MongoDB at the moment. Currently we are only using a small EC2 setup for a delayed job queue backed by
agenda. If it works out well we might look to see where it could become a primary document storage engine for us.
Used for proofs of concept and personal projects with a document data model, especially with need for strong geographic queries. Often not chosen in long term apps due to chance data model can end up relational as needs develop.
Real-time analytics are much better than periodically run batch jobs, so recently we open sourced Pyleus which allows anyone to write Storm topologies using Python.