What is Apache Storm?
What is Kafka?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Apache Storm?
Sign up to get full access to all the companiesMake informed product decisions
What tools integrate with Apache Storm?
Sign up to get full access to all the tool integrationsMake informed product decisions
Front-end messages are logged to Kafka by our API and application servers. We have batch processing (on the middle-left) and real-time processing (on the middle-right) pipelines to process the experiment data. For batch processing, after daily raw log get to s3, we start our nightly experiment workflow to figure out experiment users groups and experiment metrics. We use our in-house workflow management system Pinball to manage the dependencies of all these MapReduce jobs.
In addition to batch processing, we also wanted to achieve real-time data processing. For example, to improve the success rate of experiments, we needed to figure out experiment group allocations in real-time once the experiment configuration was pushed out to production. We used Storm to tail Kafka and compute aggregated metrics in real-time to provide crucial stats.
Real-time analytics are much better than periodically run batch jobs, so recently we open sourced Pyleus which allows anyone to write Storm topologies using Python.
Building out real-time streaming server to present data insights to Coolfront Mobile customers and internal sales and marketing teams.