Dec 9, 2016
Distributing storage and improving search with Nebula
In 2015 Airbnb grew to a point that a scalable and distributed storage system was required to store data for some applications, especially search. Supporting low-latency personalized search was a major driver of the new architecture, as well as not having to index directly with the Rails and MySQL main application.
For this purpose, Airbnb created Nebula, which supports both real-time and batch access. The real-time part is powered by DynamoDB and the batch is a file format called HFileService, developed in-house at Airbnb.
Spark is used to merge all historical data together with the batch updates, with snapshots stored on S3. Nebula also can stream updates using Kinesis and Kafka, to keep other applications aware of the latest changes.
