Impala vs. Pig

Get help choosing one of these Get news updates about these tools


Impala

Pig

Favorites

8

Favorites

5

GitHub Stats

Description

What is Impala?

Impala is a modern, open source, MPP SQL query engine for Apache Hadoop. Impala is shipped by Cloudera, MapR, and Amazon. With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.

What is Pig?

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

Pros about this tool

Pros
Why do you like Impala?

Pros
Why do you like Pig?

Companies

12 Companies Using Impala
7 Companies Using Pig

Integrations

Impala Integrations
Pig Integrations

What are some alternatives to Impala and Pig?

  • Apache Spark - Fast and general engine for large-scale data processing
  • Apache Flink - Fast and reliable large-scale data processing engine
  • Druid - Fast column-oriented distributed data store
  • Presto - Distributed SQL Query Engine for Big Data (by Facebook)

See all alternatives to Impala



Interest Over Time


Get help choosing one of these