Why developers like Cassandra

What is Cassandra?

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Cassandra is a tool in the Databases category of a tech stack.

Cassandra is an open source tool with 9.1K GitHub stars and 3.7K GitHub forks. Here’s a link to Cassandra's open source repository on GitHub

Explore Cassandra's Story

Who uses Cassandra?

Companies

529 companies reportedly use Cassandra in their tech stacks, including Uber, Facebook, and Netflix.

Uber

Facebook

Netflix

Instagram

Spotify

Instacart

Accenture

ebay

Developers

2861 developers on StackShare have stated that they use Cassandra.

Scaleway

paruckerr

Virtual API Cloud

Stream's Stack

2015

Personal Tech Stack

Packages Used

Gluru

CloudBoost.io

Cassandra Integrations

Datadog, Kong, Liquibase, DataGrip, and Redash are some of the popular tools that integrate with Cassandra. Here's a list of all 55 tools that integrate with Cassandra.

Datadog

Kong

Liquibase

DataGrip

Redash

Jaeger

Buddy

Apache Zeppelin

Retool

Pros of Cassandra

119

Distributed

High performance

High availability

Easy scalability

Replication

Reliable

Multi datacenter deployments

Schema optional

OLTP

Open source

Workload separation (via MDC)

Fast

Decisions about Cassandra

Here are some stack decisions, common use cases and reviews by companies and developers who chose Cassandra in their tech stack.

kew44

Nov 10, 2022 | 6 upvotes · 116.6K views

Needs advice

Amazon S3

Dremio

and

Snowflake

Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
Processing-> We want to use SAS if at all possible. What will work with SAS code?
Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, JavaScript, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!

Bhanu Sai Pavan Kumar Pothuri

Developer at SAP Labs India Private Ltd · Mar 3, 2022 | 2 upvotes · 34.1K views

Needs advice

Cassandra

MongoDB

and

PostgreSQL

I was reading instagram system design question

He said he will be using Cassandra for tables storing user, photo metadata, user follows tables... I didn't understand why he is using Cassandra and why not regular rdms like PostgreSQL or nosql like MongoDB.

Jasper Lalbuatsaih

Aug 12, 2021 | 2 upvotes · 40.7K views

Needs advice

Cassandra

and

DataStax Enterprise

Why should I consider DataStax Enterprise instead of vanilla Cassandra?

Umair Iftikhar

Technical Architect at ERP Studio · Feb 12, 2021 | 3 upvotes · 455.3K views

Needs advice

Cassandra

Druid

and

TimescaleDB

Developing a solution that collects Telemetry Data from different devices, nearly 1000 devices minimum and maximum 12000. Each device is sending 2 packets in 1 second. This is time-series data, and this data definition and different reports are saved on PostgreSQL. Like Building information, maintenance records, etc. I want to know about the best solution. This data is required for Math and ML to run different algorithms. Also, data is raw without definitions and information stored in PostgreSQL. Initially, I went with TimescaleDB due to PostgreSQL support, but to increase in sites, I started facing many issues with timescale DB in terms of flexibility of storing data.

My major requirement is also the replication of the database for reporting and different purposes. You may also suggest other options other than Druid and Cassandra. But an open source solution is appreciated.

See all decisions

Blog Posts

3 Innovations While Unifying Pinterest’s Key-Value Storage

Mar 9 2022 at 6:41AM

1043

How Sqreen handles 50,000 requests every minute in a write-hea...

Sep 17 2019 at 9:38PM

Sqreen

+17

6950

Stream & Go: News Feeds for Over 300 Million End Users

Jan 18 2018 at 7:43AM

Stream

+42

41058

How Mashape Manages Over 15,000 APIs & Microservices

Sep 25 2015 at 8:45AM

Kong

+26

16158

Cassandra Alternatives & Comparisons

What are some alternatives to Cassandra?

HBase

Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop.

Google Cloud Bigtable

Google Cloud Bigtable offers you a fast, fully managed, massively scalable NoSQL database service that's ideal for web, mobile, and Internet of Things applications requiring terabytes to petabytes of data. Unlike comparable market offerings, Cloud Bigtable doesn't require you to sacrifice speed, scale, or cost efficiency when your applications grow. Cloud Bigtable has been battle-tested at Google for more than 10 years—it's the database driving major applications such as Google Analytics and Gmail.

Hadoop

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Redis

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. Redis provides data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes, and streams.

Couchbase

Developed as an alternative to traditionally inflexible SQL databases, the Couchbase NoSQL database is built on an open source foundation and architected to help developers solve real-world problems and meet high scalability demands.

See all alternatives

Related Comparisons