Application and Data

Big Data as a Service

Alternatives to Amazon Redshift

Google BigQuery, Amazon Athena, Amazon DynamoDB, Amazon Redshift Spectrum, and Hadoop are the most popular alternatives and competitors to Amazon Redshift.

Stacks1.5K

Followers1.4K

+ 1

Votes108

What is Amazon Redshift and what are its top alternatives?

It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

Amazon Redshift is a tool in the Big Data as a Service category of a tech stack.

Explore Amazon Redshift's Story

Top Alternatives to Amazon Redshift

Google BigQuery
Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python. ...
Amazon Athena
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. ...
Amazon DynamoDB
With it , you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use. ...
Amazon Redshift Spectrum
With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” -- without having to load or transform any data. ...
Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...
Microsoft Azure
Azure is an open and flexible cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. You can build applications using any language, tool or framework. And you can integrate your public cloud applications with your existing IT environment. ...
Snowflake
Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn. ...
Apache Aurora
Apache Aurora is a service scheduler that runs on top of Mesos, enabling you to run long-running services that take advantage of Mesos' scalability, fault-tolerance, and resource isolation. ...

Amazon Redshift alternatives & related posts

Google BigQuery

1.7K

152

Analyze terabytes of data in seconds

Stacks1.7K

Votes152

PROS OF GOOGLE BIGQUERY

28
High Performance
25
Easy to use
22
Fully managed service
19
Cheap Pricing
16
Process hundreds of GB in seconds
12
Big Data
11
Full table scans in seconds, no indexes needed
8
Always on, no per-hour costs
6
Good combination with fluentd
4
Machine learning
1
Easy to manage
0
Easy to learn

CONS OF GOOGLE BIGQUERY

1
You can't unit test changes in BQ data
0
Sdas

COMPARE

Compare Google BigQuery vs Amazon Redshift

related Google BigQuery posts

Sung Won Chung

Nov 4, 2019 | 33 upvotes · 2.1M views

Shared insights

on

Google Cloud IoT Core

Google Cloud IoT Core

Terraform

Python

Google Cloud Deployment Manager

Google Cloud Deployment Manager

Google Cloud Build

Google Cloud Build +6 more

Context: I wanted to create an end to end IoT data pipeline simulation in Google Cloud IoT Core and other GCP services. I never touched Terraform meaningfully until working on this project, and it's one of the best explorations in my development career. The documentation and syntax is incredibly human-readable and friendly. I'm used to building infrastructure through the google apis via Python , but I'm so glad past Sung did not make that decision. I was tempted to use Google Cloud Deployment Manager, but the templates were a bit convoluted by first impression. I'm glad past Sung did not make this decision either.

Solution: Leveraging Google Cloud Build Google Cloud Run Google Cloud Bigtable Google BigQuery Google Cloud Storage Google Compute Engine along with some other fun tools, I can deploy over 40 GCP resources using Terraform!

Check Out My Architecture: CLICK ME

Check out the GitHub repo attached

GitHub - sungchun12/iot-python-webapp: Live, real-time dashboard in a serverless docker web app, and deployed via terraform with a built-in CICD trigger-See Mock Website

Tim Specht

‎Co-Founder and CTO at Dubsmash · Sep 13, 2018 | 14 upvotes · 983K views

Shared insights

on

Google Analytics

Google Analytics

Amazon Kinesis

AWS Lambda

Amazon SQS

Google BigQuery

Google BigQuery

at

In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

Dubsmash: Scaling To 200 Million Users With 3 Engineers - Dubsmash Tech Stack | StackShare

Amazon Athena

501

49

Query S3 Using SQL

Stacks501

Votes49

PROS OF AMAZON ATHENA

16
Use SQL to analyze CSV files
8
Glue crawlers gives easy Data catalogue
7
Cheap
6
Query all my data without running servers 24x7
4
No data base servers yay
3
Easy integration with QuickSight
2
Query and analyse CSV,parquet,json files in sql
2
Also glue and athena use same data catalog
1
No configuration required
0
Ad hoc checks on data made easy

CONS OF AMAZON ATHENA

Be the first to leave a con

COMPARE

Compare Amazon Athena vs Amazon Redshift

related Amazon Athena posts

Raunak Dave

Apr 15, 2022 | 5 upvotes · 48.9K views

Shared insights

on

Amazon Athena

Amazon DynamoDB

Amazon DynamoDB

Amazon S3

So, I have data in Amazon S3 as parquet files and I have it available in the Glue data catalog too. I want to build an AppSync API on top of this data. Now the two options that I am considering are:

Bring the data to Amazon DynamoDB and then build my API on top of this Database.
Add a Lambda function that resolves Amazon Athena queries made by AppSync.

Which of the two approaches will be cost effective?

I would really appreciate some back of the envelope estimates too.

Note: I only expect to make read queries. Thanks.

Sung Won Chung

Jun 5, 2019 | 4 upvotes · 284.7K views

Shared insights

on

Amazon Athena

Google BigQuery

Google BigQuery

I use Amazon Athena because similar to Google BigQuery , you can store and query data easily. Especially since you can define data schema in the Glue data catalog, there's a central way to define data models.

However, I would not recommend for batch jobs. I typically use this to check intermediary datasets in data engineering workloads. It's good for getting a look and feel of the data along its ETL journey.

Amazon DynamoDB

3.7K

195

Fully managed NoSQL database service

Stacks3.7K

Votes195

PROS OF AMAZON DYNAMODB

62
Predictable performance and cost
56
Scalable
35
Native JSON Support
21
AWS Free Tier
7
Fast
3
No sql
3
To store data
2
Serverless
2
No Stored procedures is GOOD
1
ORM with DynamoDBMapper
1
Elastic Scalability using on-demand mode
1
Elastic Scalability using autoscaling
1
DynamoDB Stream

CONS OF AMAZON DYNAMODB

4
Only sequential access for paginate data
1
Scaling
1
Document Limit Size

COMPARE

Compare Amazon DynamoDB vs Amazon Redshift

related Amazon DynamoDB posts

Praveen Mooli

Engineering Manager at Taylor and Francis · Jul 29, 2019 | 19 upvotes · 4.1M views

Shared insights

on

Flask

MongoDB Atlas

Java

Spring Boot

Node.js

Node.js +17 more

We are in the process of building a modern content platform to deliver our content through various channels. We decided to go with Microservices architecture as we wanted scale. Microservice architecture style is an approach to developing an application as a suite of small independently deployable services built around specific business capabilities. You can gain modularity, extensive parallelism and cost-effective scaling by deploying services across many distributed servers. Microservices modularity facilitates independent updates/deployments, and helps to avoid single point of failure, which can help prevent large-scale outages. We also decided to use Event Driven Architecture pattern which is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.

To build our #Backend capabilities we decided to use the following: 1. #Microservices - Java with Spring Boot , Node.js with ExpressJS and Python with Flask 2. #Eventsourcingframework - Amazon Kinesis , Amazon Kinesis Firehose , Amazon SNS , Amazon SQS, AWS Lambda 3. #Data - Amazon RDS , Amazon DynamoDB , Amazon S3 , MongoDB Atlas

To build #Webapps we decided to use Angular 2 with RxJS

#Devops - GitHub , Travis CI , Terraform , Docker , Serverless

Julien DeFrance

Principal Software Engineer at Tophatter · Feb 24, 2019 | 16 upvotes · 3.2M views

Shared insights

on

Rails

Rails API

AWS Elastic Beanstalk

AWS Elastic Beanstalk

Capistrano

Docker

Docker +20 more

at

Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

Future improvements / technology decisions included:

Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

Amazon Redshift Spectrum

101

3

Exabyte-Scale In-Place Queries of S3 Data

Stacks101

Votes3

PROS OF AMAZON REDSHIFT SPECTRUM

1
Good Performance
1
Great Documentation
1
Economical

CONS OF AMAZON REDSHIFT SPECTRUM

Be the first to leave a con

COMPARE

Compare Amazon Redshift Spectrum vs Amazon Redshift

related Amazon Redshift Spectrum posts

Hadoop

2.5K

56

Open-source software for reliable, scalable, distributed computing

Stacks2.5K

Votes56

PROS OF HADOOP

39
Great ecosystem
11
One stack to rule them all
4
Great load balancer
1
Amazon aws
1
Java syntax

CONS OF HADOOP

Be the first to leave a con

COMPARE

Compare Hadoop vs Amazon Redshift

related Hadoop posts

StackShare Editors

May 10, 2014 | 11 upvotes · 621.3K views

Shared insights

on

Kafka

Hadoop

at

The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.

For databases, a custom Hadoop streamer pulled database data and wrote it to S3.

Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.

Scalable and reliable data ingestion at Pinterest - Pinterest Engineering - Medium

Conor Myhrvold

Tech Brand Mgr, Office of CTO at Uber · Dec 4, 2018 | 7 upvotes · 3M views

Shared insights

on

Kafka

Kafka Manager

Hadoop

Apache Spark

GitHub

at

Uber Technologies

Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

(Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

Marmaray: An Open Source Generic Data Ingestion and Dispersal Framework and Library for Apache Hadoop | Uber Engineering Blog

Microsoft Azure

25K

768

Integrated cloud services and infrastructure to support computing, database, analytics, mobile, and web scenarios.

Stacks25K

Votes768

PROS OF MICROSOFT AZURE

CONS OF MICROSOFT AZURE

7
Confusing UI
2
Expensive plesk on Azure

COMPARE

Compare Microsoft Azure vs Amazon Redshift

related Microsoft Azure posts

Ganesa Vijayakumar

Full Stack Coder | Technical Architect · May 13, 2019 | 19 upvotes · 5.7M views

Shared insights

on

Codacy

SonarQube

React

React Router

React Native

React Native +20 more

I'm planning to create a web application and also a mobile application to provide a very good shopping experience to the end customers. Shortly, my application will be aggregate the product details from difference sources and giving a clear picture to the user that when and where to buy that product with best in Quality and cost.

I have planned to develop this in many milestones for adding N number of features and I have picked my first part to complete the core part (aggregate the product details from different sources).

As per my work experience and knowledge, I have chosen the followings stacks to this mission.

UI: I would like to develop this application using React, React Router and React Native since I'm a little bit familiar on this and also most importantly these will help on developing both web and mobile apps. In addition, I'm gonna use the stacks JavaScript, jQuery, jQuery UI, jQuery Mobile, Bootstrap wherever required.

Service: I have planned to use Java as the main business layer language as I have 7+ years of experience on this I believe I can do better work using Java than other languages. In addition, I'm thinking to use the stacks Node.js.

Database and ORM: I'm gonna pick MySQL as DB and Hibernate as ORM since I have a piece of good knowledge and also work experience on this combination.

Search Engine: I need to deal with a large amount of product data and it's in-detailed info to provide enough details to end user at the same time I need to focus on the performance area too. so I have decided to use Solr as a search engine for product search and suggestions. In addition, I'm thinking to replace Solr by Elasticsearch once explored/reviewed enough about Elasticsearch.

Host: As of now, my plan to complete the application with decent features first and deploy it in a free hosting environment like Docker and Heroku and then once it is stable then I have planned to use the AWS products Amazon S3, EC2, Amazon RDS and Amazon Route 53. I'm not sure about Microsoft Azure that what is the specialty in it than Heroku and Amazon EC2 Container Service. Anyhow, I will do explore these once again and pick the best suite one for my requirement once I reached this level.

Build and Repositories: I have decided to choose Apache Maven and Git as these are my favorites and also so popular on respectively build and repositories.

Additional Utilities :) - I would like to choose Codacy for code review as their Startup plan will be very helpful to this application. I'm already experienced with Google CheckStyle and SonarQube even I'm looking something on Codacy.

Happy Coding! Suggestions are welcome! :)

Thanks, Ganesa

Omar Mehilba

Co-Founder and COO at Magalix · Dec 4, 2018 | 19 upvotes · 438.6K views

Shared insights

on

Kubernetes

Microsoft Azure

Microsoft Azure

Google Kubernetes Engine

Google Kubernetes Engine

Amazon EC2

Golang

at

We are hardcore Kubernetes users and contributors. We loved the automation it provides. However, as our team grew and added more clusters and microservices, capacity and resources management becomes a massive pain to us. We started suffering from a lot of outages and unexpected behavior as we promote our code from dev to production environments. Luckily we were working on our AI-powered tools to understand different dependencies, predict usage, and calculate the right resources and configurations that should be applied to our infrastructure and microservices. We dogfooded our agent (http://github.com/magalixcorp/magalix-agent) and were able to stabilize as the #autopilot continuously recovered any miscalculations we made or because of unexpected changes in workloads. We are open sourcing our agent in a few days. Check it out and let us know what you think! We run workloads on Microsoft Azure Google Kubernetes Engine and Amazon EC2 and we're all about Go and Python!

Our experience with an autopilot controlled infrastructure!

Snowflake

1.1K

27

The data warehouse built for the cloud

Stacks1.1K

Votes27

PROS OF SNOWFLAKE

7
Public and Private Data Sharing
4
Multicloud
4
Good Performance
4
User Friendly
3
Great Documentation
2
Serverless
1
Economical
1
Usage based billing
1
Innovative

CONS OF SNOWFLAKE

Be the first to leave a con

COMPARE

Compare Snowflake vs Amazon Redshift

related Snowflake posts

Jeffrey Richman

Sep 21, 2022 | 5 upvotes · 125.3K views

Shared insights

on

Snowflake

Google BigQuery

Google BigQuery

Cloud Firestore

Cloud Firestore

I'm wondering if any Cloud Firestore users might be open to sharing some input and challenges encountered when trying to create a low-cost, low-latency data pipeline to their Analytics warehouse (e.g. Google BigQuery, Snowflake, etc...)

I'm working with a platform by the name of Estuary.dev, an ETL/ELT and we are conducting some research on the pain points here to see if there are drawbacks of the Firestore->BQ extension and/or if users are seeking easy ways for getting nosql->fine-grained tabular data

Please feel free to drop some knowledge/wish list stuff on me for a better pipeline here!

Sung Won Chung

Jun 5, 2019 | 4 upvotes · 314.6K views

Shared insights

on

Google BigQuery

Google BigQuery

Snowflake

I use Google BigQuery because it makes is super easy to query and store data for analytics workloads. If you're using GCP, you're likely using BigQuery. However, running data viz tools directly connected to BigQuery will run pretty slow. They recently announced BI Engine which will hopefully compete well against big players like Snowflake when it comes to concurrency.

What's nice too is that it has SQL-based ML tools, and it has great GIS support!

Apache Aurora

70

0

An Apcahe Mesos framework for scheduling jobs, originally developed by Twitter

Stacks70

Votes0

PROS OF APACHE AURORA

Be the first to leave a pro

CONS OF APACHE AURORA

Be the first to leave a con

COMPARE

Compare Apache Aurora vs Amazon Redshift

related Apache Aurora posts

StackShare Editors

Mar 16, 2014 | 1 upvote · 292.9K views

Shared insights

on

Apache Mesos

Docker

Apache Aurora

at

Uber Technologies

Docker containers on Mesos run their microservices with consistent configurations at scale, along with Aurora for long-running services and cron jobs.

The Uber Engineering Tech Stack, Part I: The Foundation | Uber Engineering Blog