Amazon Redshift vs HBase: What are the differences?
Amazon Redshift: Fast, fully managed, petabyte-scale data warehouse service. Redshift makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions; HBase: The Hadoop database, a distributed, scalable, big data store. Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
Amazon Redshift belongs to "Big Data as a Service" category of the tech stack, while HBase can be primarily classified under "Databases".
"Data Warehousing" is the top reason why over 27 developers like Amazon Redshift, while over 7 developers mention "Performance" as the leading cause for choosing HBase.
HBase is an open source tool with 2.87K GitHub stars and 1.98K GitHub forks. Here's a link to HBase's open source repository on GitHub.
Lyft, PedidosYa, and Zapier are some of the popular companies that use Amazon Redshift, whereas HBase is used by SendGrid, HubSpot, and hike. Amazon Redshift has a broader approval, being mentioned in 267 company stacks & 63 developers stacks; compared to HBase, which is listed in 54 company stacks and 18 developer stacks.
What is Amazon Redshift?
What is HBase?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Amazon Redshift?
What are the cons of using HBase?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
The final output is inserted into HBase to serve the experiment dashboard. We also load the output data to Redshift for ad-hoc analysis. For real-time experiment data processing, we use Storm to tail Kafka and process data in real-time and insert metrics into MySQL, so we could identify group allocation problems and send out real-time alerts and metrics.
Aggressive archiving of historical data to keep the production database as small as possible. Using our in-house soon-to-be-open-sourced ETL library, SharpShifter.