Need advice about which tool to choose?Ask the StackShare community!

Amazon Redshift

1.5K
1.4K
+ 1
108
Cassandra

3.5K
3.5K
+ 1
507
Add tool

Amazon Redshift vs Cassandra: What are the differences?

Introduction:

Amazon Redshift and Cassandra are both database management systems that are widely used in the industry. However, there are key differences between the two that make them suitable for different use cases. In the following paragraphs, we will explore the major differences between Amazon Redshift and Cassandra.

  1. Data Model: Amazon Redshift is a columnar database that organizes data into columns rather than rows. It is optimized for analytics and provides high performance for complex queries involving large amounts of data. On the other hand, Cassandra is a distributed NoSQL database that uses a key-value data model. It is designed for scalability and is well-suited for applications that require high write throughput and low latency.

  2. ACID Compliance: Amazon Redshift is ACID-compliant, meaning it supports transactions that are Atomic, Consistent, Isolated, and Durable. This ensures data integrity and reliability, making it suitable for applications that require strong consistency guarantees. Cassandra, on the other hand, sacrifices ACID compliance for scalability and high availability. It uses a tunable consistency model that allows trade-offs between consistency, availability, and partition tolerance.

  3. Scalability: Both Amazon Redshift and Cassandra are designed to scale, but they do so in different ways. Amazon Redshift achieves scalability through its massively parallel processing (MPP) architecture, where data is distributed across multiple nodes for parallel execution. It can handle petabyte-scale datasets and provide high query performance. Cassandra, on the other hand, achieves scalability through its distributed architecture. It uses a peer-to-peer model where data is replicated across multiple nodes in a cluster, providing high availability and fault tolerance.

  4. Data Replication: In Amazon Redshift, data replication is handled automatically and transparently. The data is replicated within the cluster for fault tolerance, ensuring high availability even in case of node failures. Cassandra, on the other hand, allows users to control how data is replicated across the cluster. Users can define replication factors and strategies to achieve the desired level of fault tolerance and data consistency.

  5. Data Consistency: Amazon Redshift provides strong data consistency guarantees. When a transaction completes successfully, the changes become visible to all subsequent queries in a consistent manner. Cassandra, on the other hand, provides eventual consistency by default. Updates to data propagate asynchronously across nodes and may take some time to converge. This trade-off allows for high write throughput and low latency but sacrifices strong data consistency.

  6. Data Types: Amazon Redshift supports a wide range of SQL data types, including text, numeric, boolean, date, and time. It also provides support for complex data types such as arrays and JSON. Cassandra, on the other hand, has a limited set of data types, including text, boolean, integer, floating-point, and timestamp. It does not support complex data types like arrays and JSON out of the box.

In summary, Amazon Redshift is a columnar database optimized for analytics with ACID compliance and strong data consistency guarantees. It is suitable for applications that require high performance, scalability, and data integrity. Cassandra, on the other hand, is a distributed NoSQL database designed for scalability, high availability, and low latency. It sacrifices ACID compliance and strong data consistency for these advantages, making it suitable for applications with high write throughput and flexible data models.

Advice on Amazon Redshift and Cassandra

We need to perform ETL from several databases into a data warehouse or data lake. We want to

  • keep raw and transformed data available to users to draft their own queries efficiently
  • give users the ability to give custom permissions and SSO
  • move between open-source on-premises development and cloud-based production environments

We want to use inexpensive Amazon EC2 instances only on medium-sized data set 16GB to 32GB feeding into Tableau Server or PowerBI for reporting and data analysis purposes.

See more
Replies (3)
John Nguyen
Recommends
on
AirflowAirflowAWS LambdaAWS Lambda

You could also use AWS Lambda and use Cloudwatch event schedule if you know when the function should be triggered. The benefit is that you could use any language and use the respective database client.

But if you orchestrate ETLs then it makes sense to use Apache Airflow. This requires Python knowledge.

See more
Recommends
on
AirflowAirflow

Though we have always built something custom, Apache airflow (https://airflow.apache.org/) stood out as a key contender/alternative when it comes to open sources. On the commercial offering, Amazon Redshift combined with Amazon Kinesis (for complex manipulations) is great for BI, though Redshift as such is expensive.

See more
Recommends

You may want to look into a Data Virtualization product called Conduit. It connects to disparate data sources in AWS, on prem, Azure, GCP, and exposes them as a single unified Spark SQL view to PowerBI (direct query) or Tableau. Allows auto query and caching policies to enhance query speeds and experience. Has a GPU query engine and optimized Spark for fallback. Can be deployed on your AWS VM or on prem, scales up and out. Sounds like the ideal solution to your needs.

See more
Vinay Mehta
Needs advice
on
CassandraCassandra
and
ScyllaDBScyllaDB

The problem I have is - we need to process & change(update/insert) 55M Data every 2 min and this updated data to be available for Rest API for Filtering / Selection. Response time for Rest API should be less than 1 sec.

The most important factors for me are processing and storing time of 2 min. There need to be 2 views of Data One is for Selection & 2. Changed data.

See more
Replies (4)
Recommends
on
ScyllaDBScyllaDB

Scylla can handle 1M/s events with a simple data model quite easily. The api to query is CQL, we have REST api but that's for control/monitoring

See more
Alex Peake
Recommends
on
CassandraCassandra

Cassandra is quite capable of the task, in a highly available way, given appropriate scaling of the system. Remember that updates are only inserts, and that efficient retrieval is only by key (which can be a complex key). Talking of keys, make sure that the keys are well distributed.

See more
Recommends
on
ScyllaDBScyllaDB

By 55M do you mean 55 million entity changes per 2 minutes? It is relatively high, means almost 460k per second. If I had to choose between Scylla or Cassandra, I would opt for Scylla as it is promising better performance for simple operations. However, maybe it would be worth to consider yet another alternative technology. Take into consideration required consistency, reliability and high availability and you may realize that there are more suitable once. Rest API should not be the main driver, because you can always develop the API yourself, if not supported by given technology.

See more
Pankaj Soni
Chief Technical Officer at Software Joint · | 2 upvotes · 145.8K views
Recommends
on
CassandraCassandra

i love syclla for pet projects however it's license which is based on server model is an issue. thus i recommend cassandra

See more
Decisions about Amazon Redshift and Cassandra
Julien Lafont

Cloud Data-warehouse is the centerpiece of modern Data platform. The choice of the most suitable solution is therefore fundamental.

Our benchmark was conducted over BigQuery and Snowflake. These solutions seem to match our goals but they have very different approaches.

BigQuery is notably the only 100% serverless cloud data-warehouse, which requires absolutely NO maintenance: no re-clustering, no compression, no index optimization, no storage management, no performance management. Snowflake requires to set up (paid) reclustering processes, to manage the performance allocated to each profile, etc. We can also mention Redshift, which we have eliminated because this technology requires even more ops operation.

BigQuery can therefore be set up with almost zero cost of human resources. Its on-demand pricing is particularly adapted to small workloads. 0 cost when the solution is not used, only pay for the query you're running. But quickly the use of slots (with monthly or per-minute commitment) will drastically reduce the cost of use. We've reduced by 10 the cost of our nightly batches by using flex slots.

Finally, a major advantage of BigQuery is its almost perfect integration with Google Cloud Platform services: Cloud functions, Dataflow, Data Studio, etc.

BigQuery is still evolving very quickly. The next milestone, BigQuery Omni, will allow to run queries over data stored in an external Cloud platform (Amazon S3 for example). It will be a major breakthrough in the history of cloud data-warehouses. Omni will compensate a weakness of BigQuery: transferring data in near real time from S3 to BQ is not easy today. It was even simpler to implement via Snowflake's Snowpipe solution.

We also plan to use the Machine Learning features built into BigQuery to accelerate our deployment of Data-Science-based projects. An opportunity only offered by the BigQuery solution

See more
Get Advice from developers at your company using StackShare Enterprise. Sign up for StackShare Enterprise.
Learn More
Pros of Amazon Redshift
Pros of Cassandra
  • 41
    Data Warehousing
  • 27
    Scalable
  • 17
    SQL
  • 14
    Backed by Amazon
  • 5
    Encryption
  • 1
    Cheap and reliable
  • 1
    Isolation
  • 1
    Best Cloud DW Performance
  • 1
    Fast columnar storage
  • 119
    Distributed
  • 98
    High performance
  • 81
    High availability
  • 74
    Easy scalability
  • 53
    Replication
  • 26
    Reliable
  • 26
    Multi datacenter deployments
  • 10
    Schema optional
  • 9
    OLTP
  • 8
    Open source
  • 2
    Workload separation (via MDC)
  • 1
    Fast

Sign up to add or upvote prosMake informed product decisions

Cons of Amazon Redshift
Cons of Cassandra
    Be the first to leave a con
    • 3
      Reliability of replication
    • 1
      Size
    • 1
      Updates

    Sign up to add or upvote consMake informed product decisions