Amazon Redshift Spectrum vs Apache Impala: What are the differences?
What is Amazon Redshift Spectrum? Exabyte-Scale In-Place Queries of S3 Data. With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” -- without having to load or transform any data.
What is Apache Impala? Real-time Query for Hadoop. Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.
Amazon Redshift Spectrum and Apache Impala belong to "Big Data Tools" category of the tech stack.
Apache Impala is an open source tool with 2.19K GitHub stars and 825 GitHub forks. Here's a link to Apache Impala's open source repository on GitHub.
Stripe, Expedia.com, and Hammer Lab are some of the popular companies that use Apache Impala, whereas Amazon Redshift Spectrum is used by VSCO, CommonBond, and intermix.io. Apache Impala has a broader approval, being mentioned in 17 company stacks & 37 developers stacks; compared to Amazon Redshift Spectrum, which is listed in 8 company stacks and 22 developer stacks.