Citus vs Scylla

Overview

Citus

Stacks60

Followers124

Votes11

GitHub Stars12.0K

Forks736

ScyllaDB

Stacks143

Followers197

Votes8

Citus vs Scylla: What are the differences?

Citus and Scylla are both popular distributed database management systems, but they differ in several key aspects.

Scalability: Citus is designed to scale horizontally, allowing you to distribute your data across multiple machines for faster query performance. On the other hand, Scylla is a NoSQL database based on Apache Cassandra, which also offers horizontal scalability by allowing you to add more nodes to the cluster.
Data Model: Citus is a PostgreSQL extension that allows you to scale out your PostgreSQL database by distributing your data. It retains the relational model and supports SQL queries, making it suitable for applications that require complex querying and transactions. On the other hand, Scylla is a wide-column NoSQL database that uses Cassandra Query Language (CQL) for data access. It is optimized for write-heavy workloads and offers high throughput and low latency for large-scale data.
Consistency and Availability: Citus ensures strong consistency by managing distributed transactions across nodes in the cluster. It guarantees that all nodes have the most up-to-date data before committing a transaction. Scylla, on the other hand, prioritizes availability over strong consistency. It uses a decentralized architecture based on the Dynamo-style distributed database, which allows for eventual consistency.
Architecture: Citus follows a shared-nothing architecture, where each node in the cluster operates independently and communicates through messaging. It shards data across multiple nodes, providing parallel query execution and fault tolerance. Scylla also follows a similar shared-nothing architecture, where each node is self-sufficient and communicates through gossip protocols. It uses a peer-to-peer distribution model to ensure fault tolerance and high availability.
Secondary Indexing: Citus supports secondary indexes on distributed tables, allowing you to efficiently query your data based on non-primary key attributes. It automatically synchronizes secondary indexes across nodes, ensuring consistent query results. Scylla, on the other hand, does not support secondary indexes out-of-the-box. Instead, it relies on alternative approaches like materialized views or denormalization to optimize queries.
Data Replication: Citus provides automatic data replication for fault tolerance and high availability. It maintains multiple copies of data across different nodes to ensure data durability. Scylla also supports data replication through a technique called replication factor, which allows you to define the number of copies to be stored for each piece of data. This provides fault tolerance and prevents data loss in case of node failures.

In summary, Citus offers horizontal scalability for PostgreSQL databases with support for complex queries and transactions, while Scylla is a NoSQL database based on Cassandra, optimized for write-heavy workloads with high throughput and low latency. Citus focuses on strong consistency and automatic data replication, while Scylla prioritizes availability and eventual consistency. Both databases have their strengths and are suitable for different use cases.

Share your Stack

Help developers discover the tools you use. Get visibility for your team's tech choices and contribute to the community's knowledge.

View Docs

CLI (Node.js)

Manual

Advice on Citus, ScyllaDB

Tom

CEO at Gentlent

Jun 9, 2020

Decided

The Gentlent Tech Team made lots of updates within the past year. The biggest one being our database:

We decided to migrate our #PostgreSQL -based database systems to a custom implementation of #Cassandra . This allows us to integrate our product data perfectly in a system that just makes sense. High availability and scalability are supported out of the box.

387k views387k

Comments

Vinay

Head of Engineering

Sep 19, 2019

Needs advice

The problem I have is - we need to process & change(update/insert) 55M Data every 2 min and this updated data to be available for Rest API for Filtering / Selection. Response time for Rest API should be less than 1 sec.

The most important factors for me are processing and storing time of 2 min. There need to be 2 views of Data One is for Selection & 2. Changed data.

174k views174k

Comments

Detailed Comparison

Citus	ScyllaDB
It's an extension to Postgres that distributes data and queries in a cluster of multiple machines. Its query engine parallelizes incoming SQL queries across these servers to enable human real-time (less than a second) responses on large datasets.	ScyllaDB is the database for data-intensive apps that require high performance and low latency. It enables teams to harness the ever-increasing computing power of modern infrastructures – eliminating barriers to scale as data grows.
Multi-Node Scalable PostgreSQL;Built-in Replication and High Availability;Real-time Reads/Writes On Multiple Nodes;Multi-core Parallel Processing of Queries;Tenant isolation	High availability; horizontal scalability; vertical scalability; Cassandra compatible; DynamoDB compatible; wide column; NoSQL; lightweight transactions; change data capture; workload prioritization; shard-per-core; IO scheduler; self-tuning
Statistics
GitHub Stars 12.0K	GitHub Stars -
GitHub Forks 736	GitHub Forks -
Stacks 60	Stacks 143
Followers 124	Followers 197
Votes 11	Votes 8
Pros & Cons
Pros 6 Multi-core Parallel Processing 3 Drop-in PostgreSQL replacement 2 Distributed with Auto-Sharding	Pros 2 Replication 1 Scale up 1 Distributed 1 Fewer nodes 1 High performance
Integrations
.NET Apache Spark Loggly Java Rails Datadog Logentries Heroku Papertrail PostgreSQL	KairosDB Wireshark JanusGraph Grafana Hackolade Prometheus Kubernetes Datadog Kafka Apache Spark

What are some alternatives to Citus, ScyllaDB?

MongoDB

MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding.

MySQL

The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.

PostgreSQL

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

Microsoft SQL Server

Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions.

SQLite

SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file.

Cassandra

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL.

Memcached

Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

MariaDB

Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry. MariaDB is designed as a drop-in replacement of MySQL(R) with more features, new storage engines, fewer bugs, and better performance.

RethinkDB

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

ArangoDB

A distributed free and open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions.

Related Comparisons

Citus vs Scylla: What are the differences?

Citus and Scylla are both popular distributed database management systems, but they differ in several key aspects.

Scalability: Citus is designed to scale horizontally, allowing you to distribute your data across multiple machines for faster query performance. On the other hand, Scylla is a NoSQL database based on Apache Cassandra, which also offers horizontal scalability by allowing you to add more nodes to the cluster.
Data Model: Citus is a PostgreSQL extension that allows you to scale out your PostgreSQL database by distributing your data. It retains the relational model and supports SQL queries, making it suitable for applications that require complex querying and transactions. On the other hand, Scylla is a wide-column NoSQL database that uses Cassandra Query Language (CQL) for data access. It is optimized for write-heavy workloads and offers high throughput and low latency for large-scale data.
Consistency and Availability: Citus ensures strong consistency by managing distributed transactions across nodes in the cluster. It guarantees that all nodes have the most up-to-date data before committing a transaction. Scylla, on the other hand, prioritizes availability over strong consistency. It uses a decentralized architecture based on the Dynamo-style distributed database, which allows for eventual consistency.
Architecture: Citus follows a shared-nothing architecture, where each node in the cluster operates independently and communicates through messaging. It shards data across multiple nodes, providing parallel query execution and fault tolerance. Scylla also follows a similar shared-nothing architecture, where each node is self-sufficient and communicates through gossip protocols. It uses a peer-to-peer distribution model to ensure fault tolerance and high availability.
Secondary Indexing: Citus supports secondary indexes on distributed tables, allowing you to efficiently query your data based on non-primary key attributes. It automatically synchronizes secondary indexes across nodes, ensuring consistent query results. Scylla, on the other hand, does not support secondary indexes out-of-the-box. Instead, it relies on alternative approaches like materialized views or denormalization to optimize queries.
Data Replication: Citus provides automatic data replication for fault tolerance and high availability. It maintains multiple copies of data across different nodes to ensure data durability. Scylla also supports data replication through a technique called replication factor, which allows you to define the number of copies to be stored for each piece of data. This provides fault tolerance and prevents data loss in case of node failures.

Citus vs Scylla

Overview