Alternatives to Amazon Redshift logo

Alternatives to Amazon Redshift

Google BigQuery, Amazon Athena, Amazon DynamoDB, Amazon Redshift Spectrum, and Hadoop are the most popular alternatives and competitors to Amazon Redshift.
1.2K
978
+ 1
100

What is Amazon Redshift and what are its top alternatives?

It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.
Amazon Redshift is a tool in the Big Data as a Service category of a tech stack.

Top Alternatives to Amazon Redshift

  • Google BigQuery

    Google BigQuery

    Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python. ...

  • Amazon Athena

    Amazon Athena

    Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. ...

  • Amazon DynamoDB

    Amazon DynamoDB

    With it , you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use. ...

  • Amazon Redshift Spectrum

    Amazon Redshift Spectrum

    With Redshift Spectrum, you can extend the analytic power of Amazon Redshift beyond data stored on local disks in your data warehouse to query vast amounts of unstructured data in your Amazon S3 “data lake” -- without having to load or transform any data. ...

  • Hadoop

    Hadoop

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...

  • Microsoft Azure

    Microsoft Azure

    Azure is an open and flexible cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters. You can build applications using any language, tool or framework. And you can integrate your public cloud applications with your existing IT environment. ...

  • Snowflake

    Snowflake

    Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn. ...

  • Apache Aurora

    Apache Aurora

    Apache Aurora is a service scheduler that runs on top of Mesos, enabling you to run long-running services that take advantage of Mesos' scalability, fault-tolerance, and resource isolation. ...

Amazon Redshift alternatives & related posts

Google BigQuery logo

Google BigQuery

1K
871
144
Analyze terabytes of data in seconds
1K
871
+ 1
144
PROS OF GOOGLE BIGQUERY
  • 27
    High Performance
  • 23
    Easy to use
  • 21
    Fully managed service
  • 19
    Cheap Pricing
  • 16
    Process hundreds of GB in seconds
  • 11
    Full table scans in seconds, no indexes needed
  • 10
    Big Data
  • 8
    Always on, no per-hour costs
  • 5
    Good combination with fluentd
  • 4
    Machine learning
CONS OF GOOGLE BIGQUERY
  • 1
    You can't unit test changes in BQ data

related Google BigQuery posts

Context: I wanted to create an end to end IoT data pipeline simulation in Google Cloud IoT Core and other GCP services. I never touched Terraform meaningfully until working on this project, and it's one of the best explorations in my development career. The documentation and syntax is incredibly human-readable and friendly. I'm used to building infrastructure through the google apis via Python , but I'm so glad past Sung did not make that decision. I was tempted to use Google Cloud Deployment Manager, but the templates were a bit convoluted by first impression. I'm glad past Sung did not make this decision either.

Solution: Leveraging Google Cloud Build Google Cloud Run Google Cloud Bigtable Google BigQuery Google Cloud Storage Google Compute Engine along with some other fun tools, I can deploy over 40 GCP resources using Terraform!

Check Out My Architecture: CLICK ME

Check out the GitHub repo attached

See more
Tim Specht
‎Co-Founder and CTO at Dubsmash · | 14 upvotes · 577.4K views

In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

See more
Amazon Athena logo

Amazon Athena

316
500
45
Query S3 Using SQL
316
500
+ 1
45
PROS OF AMAZON ATHENA
  • 14
    Use SQL to analyze CSV files
  • 8
    Glue crawlers gives easy Data catalogue
  • 6
    Cheap
  • 5
    Query all my data without running servers 24x7
  • 4
    No data base servers yay
  • 3
    Easy integration with QuickSight
  • 2
    Query and analyse CSV,parquet,json files in sql
  • 2
    Also glue and athena use same data catalog
  • 1
    No configuration required
  • 0
    Ad hoc checks on data made easy
CONS OF AMAZON ATHENA
    Be the first to leave a con

    related Amazon Athena posts

    I use Amazon Athena because similar to Google BigQuery , you can store and query data easily. Especially since you can define data schema in the Glue data catalog, there's a central way to define data models.

    However, I would not recommend for batch jobs. I typically use this to check intermediary datasets in data engineering workloads. It's good for getting a look and feel of the data along its ETL journey.

    See more

    Hi all,

    Currently, we need to ingest the data from Amazon S3 to DB either Amazon Athena or Amazon Redshift. But the problem with the data is, it is in .PSV (pipe separated values) format and the size is also above 200 GB. The query performance of the timeout in Athena/Redshift is not up to the mark, too slow while compared to Google BigQuery. How would I optimize the performance and query result time? Can anyone please help me out?

    See more
    Amazon DynamoDB logo

    Amazon DynamoDB

    2.9K
    2.5K
    195
    Fully managed NoSQL database service
    2.9K
    2.5K
    + 1
    195
    PROS OF AMAZON DYNAMODB
    • 62
      Predictable performance and cost
    • 56
      Scalable
    • 35
      Native JSON Support
    • 21
      AWS Free Tier
    • 7
      Fast
    • 3
      No sql
    • 3
      To store data
    • 2
      Serverless
    • 2
      No Stored procedures is GOOD
    • 1
      ORM with DynamoDBMapper
    • 1
      Elastic Scalability using on-demand mode
    • 1
      Elastic Scalability using autoscaling
    • 1
      DynamoDB Stream
    CONS OF AMAZON DYNAMODB
    • 3
      Only sequential access for paginate data

    related Amazon DynamoDB posts

    Julien DeFrance
    Principal Software Engineer at Tophatter · | 16 upvotes · 2.2M views

    Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

    I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

    For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

    Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

    Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

    Future improvements / technology decisions included:

    Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

    As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

    One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

    See more
    Dmitry Mukhin

    Uploadcare has built an infinitely scalable infrastructure by leveraging AWS. Building on top of AWS allows us to process 350M daily requests for file uploads, manipulations, and deliveries. When we started in 2011 the only cloud alternative to AWS was Google App Engine which was a no-go for a rather complex solution we wanted to build. We also didn’t want to buy any hardware or use co-locations.

    Our stack handles receiving files, communicating with external file sources, managing file storage, managing user and file data, processing files, file caching and delivery, and managing user interface dashboards.

    At its core, Uploadcare runs on Python. The Europython 2011 conference in Florence really inspired us, coupled with the fact that it was general enough to solve all of our challenges informed this decision. Additionally we had prior experience working in Python.

    We chose to build the main application with Django because of its feature completeness and large footprint within the Python ecosystem.

    All the communications within our ecosystem occur via several HTTP APIs, Redis, Amazon S3, and Amazon DynamoDB. We decided on this architecture so that our our system could be scalable in terms of storage and database throughput. This way we only need Django running on top of our database cluster. We use PostgreSQL as our database because it is considered an industry standard when it comes to clustering and scaling.

    See more
    Amazon Redshift Spectrum logo

    Amazon Redshift Spectrum

    78
    109
    0
    Exabyte-Scale In-Place Queries of S3 Data
    78
    109
    + 1
    0
    PROS OF AMAZON REDSHIFT SPECTRUM
      Be the first to leave a pro
      CONS OF AMAZON REDSHIFT SPECTRUM
        Be the first to leave a con

        related Amazon Redshift Spectrum posts

        Hadoop logo

        Hadoop

        1.9K
        1.9K
        54
        Open-source software for reliable, scalable, distributed computing
        1.9K
        1.9K
        + 1
        54
        PROS OF HADOOP
        • 38
          Great ecosystem
        • 11
          One stack to rule them all
        • 4
          Great load balancer
        • 1
          Java syntax
        CONS OF HADOOP
          Be the first to leave a con

          related Hadoop posts

          Conor Myhrvold
          Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 876.2K views

          Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

          Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

          https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

          (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

          See more
          Shared insights
          on
          KafkaKafkaHadoopHadoop
          at

          The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.

          For databases, a custom Hadoop streamer pulled database data and wrote it to S3.

          Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.

          See more
          Microsoft Azure logo

          Microsoft Azure

          14.6K
          8.6K
          740
          Integrated cloud services and infrastructure to support computing, database, analytics, mobile, and web scenarios.
          14.6K
          8.6K
          + 1
          740
          PROS OF MICROSOFT AZURE
          • 111
            Scales well and quite easy
          • 93
            Can use .Net or open source tools
          • 79
            Startup friendly
          • 72
            Startup plans via BizSpark
          • 61
            High performance
          • 36
            Wide choice of services
          • 31
            Lots of integrations
          • 31
            Low cost
          • 29
            Reliability
          • 18
            Twillio & Github are directly accessible
          • 11
            RESTful API
          • 9
            Enterprise Grade
          • 9
            Startup support
          • 8
            PaaS
          • 7
            DocumentDB
          • 7
            In person support
          • 6
            Free for students
          • 5
            Virtual Machines
          • 5
            Service Bus
          • 5
            It rocks
          • 4
            CDN
          • 4
            Infrastructure Services
          • 4
            Storage, Backup, and Recovery
          • 4
            SQL Databases
          • 4
            Redis Cache
          • 3
            Built on Node.js
          • 3
            Big Data
          • 3
            BizSpark 60k Azure Benefit
          • 3
            IaaS
          • 3
            Integration
          • 3
            HDInsight
          • 3
            Preview Portal
          • 3
            Scheduler
          • 2
            Mobile
          • 2
            Big Compute
          • 2
            SaaS
          • 2
            Storage
          • 2
            StorSimple
          • 2
            Machine Learning
          • 2
            Stream Analytics
          • 2
            Data Factory
          • 2
            Event Hubs
          • 2
            Virtual Network
          • 2
            ExpressRoute
          • 2
            Traffic Manager
          • 2
            Media Services
          • 2
            Automation
          • 2
            Operational Insights
          • 2
            Key Vault
          • 2
            Infrastructure near your customers
          • 2
            Media
          • 2
            Easy Deployment
          • 2
            Dev-Test
          • 2
            BizTalk Services
          • 2
            Web
          • 2
            Backup
          • 2
            Site Recovery
          • 2
            Active Directory
          • 2
            Multi-Factor Authentication
          • 2
            Visual Studio Online
          • 2
            Application Insights
          • 1
            Documentation
          • 1
            Remote Debugging
          • 1
            Enterprise customer preferences
          • 1
            Security
          • 1
            Open cloud
          • 1
            Best cloud platfrom
          • 1
            Easy and fast to start with
          CONS OF MICROSOFT AZURE
          • 5
            Confusing UI
          • 2
            Expensive plesk on Azure

          related Microsoft Azure posts

          Omar Mehilba
          Co-Founder and COO at Magalix · | 18 upvotes · 243.8K views

          We are hardcore Kubernetes users and contributors. We loved the automation it provides. However, as our team grew and added more clusters and microservices, capacity and resources management becomes a massive pain to us. We started suffering from a lot of outages and unexpected behavior as we promote our code from dev to production environments. Luckily we were working on our AI-powered tools to understand different dependencies, predict usage, and calculate the right resources and configurations that should be applied to our infrastructure and microservices. We dogfooded our agent (http://github.com/magalixcorp/magalix-agent) and were able to stabilize as the #autopilot continuously recovered any miscalculations we made or because of unexpected changes in workloads. We are open sourcing our agent in a few days. Check it out and let us know what you think! We run workloads on Microsoft Azure Google Kubernetes Engine and Amazon EC2 and we're all about Go and Python!

          See more
          Kestas Barzdaitis
          Entrepreneur & Engineer · | 16 upvotes · 388.6K views

          CodeFactor being a #SAAS product, our goal was to run on a cloud-native infrastructure since day one. We wanted to stay product focused, rather than having to work on the infrastructure that supports the application. We needed a cloud-hosting provider that would be reliable, economical and most efficient for our product.

          CodeFactor.io aims to provide an automated and frictionless code review service for software developers. That requires agility, instant provisioning, autoscaling, security, availability and compliance management features. We looked at the top three #IAAS providers that take up the majority of market share: Amazon's Amazon EC2 , Microsoft's Microsoft Azure, and Google Compute Engine.

          AWS has been available since 2006 and has developed the most extensive services ant tools variety at a massive scale. Azure and GCP are about half the AWS age, but also satisfied our technical requirements.

          It is worth noting that even though all three providers support Docker containerization services, GCP has the most robust offering due to their investments in Kubernetes. Also, if you are a Microsoft shop, and develop in .NET - Visual Studio Azure shines at integration there and all your existing .NET code works seamlessly on Azure. All three providers have serverless computing offerings (AWS Lambda, Azure Functions, and Google Cloud Functions). Additionally, all three providers have machine learning tools, but GCP appears to be the most developer-friendly, intuitive and complete when it comes to #Machinelearning and #AI.

          The prices between providers are competitive across the board. For our requirements, AWS would have been the most expensive, GCP the least expensive and Azure was in the middle. Plus, if you #Autoscale frequently with large deltas, note that Azure and GCP have per minute billing, where AWS bills you per hour. We also applied for the #Startup programs with all three providers, and this is where Azure shined. While AWS and GCP for startups would have covered us for about one year of infrastructure costs, Azure Sponsorship would cover about two years of CodeFactor's hosting costs. Moreover, Azure Team was terrific - I felt that they wanted to work with us where for AWS and GCP we were just another startup.

          In summary, we were leaning towards GCP. GCP's advantages in containerization, automation toolset, #Devops mindset, and pricing were the driving factors there. Nevertheless, we could not say no to Azure's financial incentives and a strong sense of partnership and support throughout the process.

          Bottom line is, IAAS offerings with AWS, Azure, and GCP are evolving fast. At CodeFactor, we aim to be platform agnostic where it is practical and retain the flexibility to cherry-pick the best products across providers.

          See more
          Snowflake logo

          Snowflake

          474
          573
          6
          The data warehouse built for the cloud
          474
          573
          + 1
          6
          PROS OF SNOWFLAKE
          • 2
            Good Performance
          • 1
            Public and Private Data Sharing
          • 1
            Multicloud
          • 1
            Great Documentation
          • 1
            Serverless
          CONS OF SNOWFLAKE
            Be the first to leave a con

            related Snowflake posts

            Shared insights
            on
            Google BigQueryGoogle BigQuerySnowflakeSnowflake

            I use Google BigQuery because it makes is super easy to query and store data for analytics workloads. If you're using GCP, you're likely using BigQuery. However, running data viz tools directly connected to BigQuery will run pretty slow. They recently announced BI Engine which will hopefully compete well against big players like Snowflake when it comes to concurrency.

            What's nice too is that it has SQL-based ML tools, and it has great GIS support!

            See more
            Apache Aurora logo

            Apache Aurora

            58
            73
            0
            An Apcahe Mesos framework for scheduling jobs, originally developed by Twitter
            58
            73
            + 1
            0
            PROS OF APACHE AURORA
              Be the first to leave a pro
              CONS OF APACHE AURORA
                Be the first to leave a con

                related Apache Aurora posts

                Docker containers on Mesos run their microservices with consistent configurations at scale, along with Aurora for long-running services and cron jobs.

                See more