Druid vs Apache Spark vs Talend

Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Druid

382
867
+ 1
32
Apache Spark

3K
3.5K
+ 1
140
Talend

153
249
+ 1
0
Manage your open source components, licenses, and vulnerabilities
Learn More
Pros of Druid
Pros of Apache Spark
Pros of Talend
  • 15
    Real Time Aggregations
  • 6
    Batch and Real-Time Ingestion
  • 5
    OLAP
  • 3
    OLAP + OLTP
  • 2
    Combining stream and historical analytics
  • 1
    OLTP
  • 61
    Open-source
  • 48
    Fast and Flexible
  • 8
    One platform for every big data problem
  • 8
    Great for distributed SQL like applications
  • 6
    Easy to install and to use
  • 3
    Works well for most Datascience usecases
  • 2
    Interactive Query
  • 2
    Machine learning libratimery, Streaming in real
  • 2
    In memory Computation
    Be the first to leave a pro

    Sign up to add or upvote prosMake informed product decisions

    Cons of Druid
    Cons of Apache Spark
    Cons of Talend
    • 3
      Limited sql support
    • 2
      Joins are not supported well
    • 1
      Complexity
    • 4
      Speed
      Be the first to leave a con

      Sign up to add or upvote consMake informed product decisions

      25
      621
      982
      132
      136
      2.8K
      - No public GitHub repository available -
      - No public GitHub repository available -

      What is Druid?

      Druid is a distributed, column-oriented, real-time analytics data store that is commonly used to power exploratory dashboards in multi-tenant environments. Druid excels as a data warehousing solution for fast aggregate queries on petabyte sized data sets. Druid supports a variety of flexible filters, exact calculations, approximate algorithms, and other useful calculations.

      What is Apache Spark?

      Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

      What is Talend?

      It is an open source software integration platform helps you in effortlessly turning data into business insights. It uses native code generation that lets you run your data pipelines seamlessly across all cloud providers and get optimized performance on all platforms.

      Need advice about which tool to choose?Ask the StackShare community!

      What companies use Druid?
      What companies use Apache Spark?
      What companies use Talend?

      Sign up to get full access to all the companiesMake informed product decisions

      What tools integrate with Druid?
      What tools integrate with Apache Spark?
      What tools integrate with Talend?

      Sign up to get full access to all the tool integrationsMake informed product decisions

      Blog Posts

      Dec 22 2021 at 5:41AM

      Pinterest

      MySQLKafkaDruid+3
      3
      625
      Mar 24 2021 at 12:57PM

      Pinterest

      GitJenkinsKafka+7
      3
      2233
      MySQLKafkaApache Spark+6
      2
      2085
      What are some alternatives to Druid, Apache Spark, and Talend?
      HBase
      Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
      MongoDB
      MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.
      Cassandra
      Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.
      Prometheus
      Prometheus is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
      Elasticsearch
      Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).
      See all alternatives