Real-time analytics are much better than periodically run batch jobs, so recently we open sourced Pyleus which allows anyone to write Storm topologies using Python. Storm
in 2009 we open sourced mrjob, which allows any engineer to write a MapReduce job without contending for resources. We’re only limited by the amount of machines in an Amazon data center (which is an issue we’ve rarely encountered). Hadoop
We’ve also been able to leverage Amazon EC2 using AWS Direct Connect, which allows our engineering teams to bring up hardware whenever they need. It’s been awesome removing the hardware barrier for getting to production. AWS Direct Connect
In October 2008 we moved to using scribe (now a custom branch), which has served us very well over the past 5+ years that we’ve been using it. We take the logs scribe aggregates and move them into Amazon S3 for storage, which makes using EMR on AWS seamless. Amazon S3