StackShare Editors
Feb 5, 2015
Data infrastructure at Airbnb
The data-informed culture of Airbnb has been central to its success, and they have developed a sophisticated infrastructure to provide a reliable and scalable system for its users.
The basic flow sends source data into the system from two sources: a Kafka event stream and MySQL dumps from delivered through Sqoop. ETL and quality checks then happen in two separate Hive clusters, which are separated to isolate compute and storage resources and to provide “disaster recovery assurances if there ever was to be an outage.”
Partitioned Hive tables help ensure the data is immutable and reproducible, and Presto is used for almost all ad hoc queries on Hive managed tables.
0 views0

