Apache Flink vs Pig vs Apache Spark

Apache Flink
Apache Flink

124
274
11
Pig
Pig

36
0
4
Apache Spark
Apache Spark

958
0
98
No Stats

What is Apache Flink?

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

What is Pig?

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. A Pig Latin program consists of a directed acyclic graph where each node represents an operation that transforms data. Operations are of two flavors: (1) relational-algebra style operations such as join, filter, project; (2) functional-programming style operators such as map, reduce.

What is Apache Spark?

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Want advice about which of these to choose?Ask the StackShare community!

Why do developers choose Apache Flink?
Why do developers choose Pig?
Why do developers choose Apache Spark?
What are the cons of using Apache Flink?
What are the cons of using Pig?
What are the cons of using Apache Spark?
    Be the first to leave a con
      Be the first to leave a con
      What companies use Apache Flink?
      What companies use Pig?
      What companies use Apache Spark?
      What are some alternatives to Apache Flink, Pig, and Apache Spark?
      Storm
      Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
      Beam
      A distributed knowledge graph store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world.
      Apache Flume
      It is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
      Kafka
      Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
      Amazon Athena
      Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
      See all alternatives
      What tools integrate with Apache Flink?
      What tools integrate with Pig?
      What tools integrate with Apache Spark?
        No integrations found
          No integrations found
          Decisions about Apache Flink, Pig, and Apache Spark
          No stack decisions found
          Interest over time
          Reviews of Apache Flink, Pig, and Apache Spark
          No reviews found
          How developers use Apache Flink, Pig, and Apache Spark
          Avatar of Wei Chen
          Wei Chen uses Apache SparkApache Spark

          Spark is good at parallel data processing management. We wrote a neat program to handle the TBs data we get everyday.

          Avatar of Ralic Lo
          Ralic Lo uses Apache SparkApache Spark

          Used Spark Dataframe API on Spark-R for big data analysis.

          Avatar of Kalibrr
          Kalibrr uses Apache SparkApache Spark

          We use Apache Spark in computing our recommendations.

          Avatar of BrainFinance
          BrainFinance uses Apache SparkApache Spark

          As a part of big data machine learning stack (SMACK).

          Avatar of Dotmetrics
          Dotmetrics uses Apache SparkApache Spark

          Big data analytics and nightly transformation jobs.

          Avatar of Coolfront Technologies
          Coolfront Technologies uses Apache FlinkApache Flink

          Used for analytics on streaming data.

          Avatar of rmetzger
          rmetzger uses Apache FlinkApache Flink

          Flink for stream data analytics

          How much does Apache Flink cost?
          How much does Pig cost?
          How much does Apache Spark cost?
          Pricing unavailable
          Pricing unavailable
          Pricing unavailable
          News about Apache Flink
          More news
          News about Pig
          More news
          News about Apache Spark
          More news