Cassandra vs Elasticsearch vs Apache Spark

Need advice about which tool to choose?Ask the StackShare community!

Cassandra

3.5K
3.5K
+ 1
507
Elasticsearch

34K
26.5K
+ 1
1.6K
Apache Spark

2.9K
3.5K
+ 1
140
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of Cassandra
Pros of Elasticsearch
Pros of Apache Spark
  • 119
    Distributed
  • 98
    High performance
  • 81
    High availability
  • 74
    Easy scalability
  • 53
    Replication
  • 26
    Reliable
  • 26
    Multi datacenter deployments
  • 10
    Schema optional
  • 9
    OLTP
  • 8
    Open source
  • 2
    Workload separation (via MDC)
  • 1
    Fast
  • 327
    Powerful api
  • 315
    Great search engine
  • 230
    Open source
  • 214
    Restful
  • 199
    Near real-time search
  • 97
    Free
  • 84
    Search everything
  • 54
    Easy to get started
  • 45
    Analytics
  • 26
    Distributed
  • 6
    Fast search
  • 5
    More than a search engine
  • 3
    Highly Available
  • 3
    Awesome, great tool
  • 3
    Great docs
  • 3
    Easy to scale
  • 2
    Fast
  • 2
    Easy setup
  • 2
    Great customer support
  • 2
    Intuitive API
  • 2
    Great piece of software
  • 2
    Reliable
  • 2
    Potato
  • 2
    Nosql DB
  • 2
    Document Store
  • 1
    Not stable
  • 1
    Scalability
  • 1
    Open
  • 1
    Github
  • 1
    Elaticsearch
  • 1
    Actively developing
  • 1
    Responsive maintainers on GitHub
  • 1
    Ecosystem
  • 1
    Easy to get hot data
  • 0
    Community
  • 61
    Open-source
  • 48
    Fast and Flexible
  • 8
    One platform for every big data problem
  • 8
    Great for distributed SQL like applications
  • 6
    Easy to install and to use
  • 3
    Works well for most Datascience usecases
  • 2
    Interactive Query
  • 2
    Machine learning libratimery, Streaming in real
  • 2
    In memory Computation

Sign up to add or upvote prosMake informed product decisions

Cons of Cassandra
Cons of Elasticsearch
Cons of Apache Spark
  • 3
    Reliability of replication
  • 1
    Size
  • 1
    Updates
  • 7
    Resource hungry
  • 6
    Diffecult to get started
  • 5
    Expensive
  • 4
    Hard to keep stable at large scale
  • 4
    Speed

Sign up to add or upvote consMake informed product decisions

- No public GitHub repository available -

What is Cassandra?

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

What is Elasticsearch?

Elasticsearch is a distributed, RESTful search and analytics engine capable of storing data and searching it in near real time. Elasticsearch, Kibana, Beats and Logstash are the Elastic Stack (sometimes called the ELK Stack).

What is Apache Spark?

Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

Need advice about which tool to choose?Ask the StackShare community!

Jobs that mention Cassandra, Elasticsearch, and Apache Spark as a desired skillset
What companies use Cassandra?
What companies use Elasticsearch?
What companies use Apache Spark?

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with Cassandra?
What tools integrate with Elasticsearch?
What tools integrate with Apache Spark?

Sign up to get full access to all the tool integrationsMake informed product decisions

Blog Posts

What are some alternatives to Cassandra, Elasticsearch, and Apache Spark?
HBase
Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.
Google Cloud Bigtable
Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years—it's the database driving major applications such as Google Analytics and Gmail.
Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Redis
Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams.
Couchbase
Developed as an alternative to traditionally inflexible SQL databases, the Couchbase NoSQL database is built on an open source foundation and architected to help developers solve real-world problems and meet high scalability demands.
See all alternatives