Apache Flink
Apache Flink

127
136
11
Apache Spark
Apache Spark

975
748
98
Add tool

Apache Flink vs Apache Spark: What are the differences?

Apache Flink: Fast and reliable large-scale data processing engine. Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala; Apache Spark: Fast and general engine for large-scale data processing. Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Apache Flink and Apache Spark can be primarily classified as "Big Data" tools.

Some of the features offered by Apache Flink are:

  • Hybrid batch/streaming runtime that supports batch processing and data streaming programs.
  • Custom memory management to guarantee efficient, adaptive, and highly robust switching between in-memory and data processing out-of-core algorithms.
  • Flexible and expressive windowing semantics for data stream programs

On the other hand, Apache Spark provides the following key features:

  • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk
  • Write applications quickly in Java, Scala or Python
  • Combine SQL, streaming, and complex analytics

"Unified batch and stream processing" is the top reason why over 6 developers like Apache Flink, while over 45 developers mention "Open-source" as the leading cause for choosing Apache Spark.

Apache Flink and Apache Spark are both open source tools. Apache Spark with 22.5K GitHub stars and 19.4K forks on GitHub appears to be more popular than Apache Flink with 9.35K GitHub stars and 5K GitHub forks.

According to the StackShare community, Apache Spark has a broader approval, being mentioned in 266 company stacks & 112 developers stacks; compared to Apache Flink, which is listed in 20 company stacks and 22 developer stacks.

What is Apache Flink?

Apache Flink is an open source system for fast and versatile data analytics in clusters. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

What is Apache Spark?

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Want advice about which of these to choose?Ask the StackShare community!

Why do developers choose Apache Flink?
Why do developers choose Apache Spark?

Sign up to add, upvote and see more prosMake informed product decisions

What are the cons of using Apache Flink?
What are the cons of using Apache Spark?
    Be the first to leave a con
    What companies use Apache Flink?
    What companies use Apache Spark?

    Sign up to get full access to all the companiesMake informed product decisions

    What tools integrate with Apache Flink?
    What tools integrate with Apache Spark?

    Sign up to get full access to all the tool integrationsMake informed product decisions

    What are some alternatives to Apache Flink and Apache Spark?
    Storm
    Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
    Beam
    A distributed knowledge graph store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world.
    Apache Flume
    It is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
    Kafka
    Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.
    Amazon Athena
    Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
    See all alternatives
    Decisions about Apache Flink and Apache Spark
    No stack decisions found
    Interest over time
    Reviews of Apache Flink and Apache Spark
    No reviews found
    How developers use Apache Flink and Apache Spark
    Avatar of Wei Chen
    Wei Chen uses Apache SparkApache Spark

    Spark is good at parallel data processing management. We wrote a neat program to handle the TBs data we get everyday.

    Avatar of Ralic Lo
    Ralic Lo uses Apache SparkApache Spark

    Used Spark Dataframe API on Spark-R for big data analysis.

    Avatar of Kalibrr
    Kalibrr uses Apache SparkApache Spark

    We use Apache Spark in computing our recommendations.

    Avatar of BrainFinance
    BrainFinance uses Apache SparkApache Spark

    As a part of big data machine learning stack (SMACK).

    Avatar of Dotmetrics
    Dotmetrics uses Apache SparkApache Spark

    Big data analytics and nightly transformation jobs.

    Avatar of Coolfront Technologies
    Coolfront Technologies uses Apache FlinkApache Flink

    Used for analytics on streaming data.

    Avatar of rmetzger
    rmetzger uses Apache FlinkApache Flink

    Flink for stream data analytics

    How much does Apache Flink cost?
    How much does Apache Spark cost?
    Pricing unavailable
    Pricing unavailable
    News about Apache Flink
    More news
    News about Apache Spark
    More news