Alternatives to Snowflake logo

Alternatives to Snowflake

MySQL, Cassandra, Amazon Redshift, Google BigQuery, and Amazon EMR are the most popular alternatives and competitors to Snowflake.
108
74
+ 1
0

What is Snowflake and what are its top alternatives?

Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)鈥攏o infrastructure to manage and no knobs to turn.
Snowflake is a tool in the Big Data as a Service category of a tech stack.

Snowflake alternatives & related posts

MySQL logo

MySQL

23.3K
18K
3.7K
23.3K
18K
+ 1
3.7K
The world's most popular open source database
MySQL logo
MySQL
VS
Snowflake logo
Snowflake

related MySQL posts

Tim Abbott
Tim Abbott
Founder at Zulip | 20 upvotes 92.6K views
atZulipZulip
Elasticsearch
Elasticsearch
MySQL
MySQL
PostgreSQL
PostgreSQL

We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks.

We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).

And then after that, we've just gotten a ton of value out of postgres. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes.

I can't recommend it highly enough.

See more
Julien DeFrance
Julien DeFrance
Principal Software Engineer at Tophatter | 16 upvotes 400.6K views
atSmartZipSmartZip
Amazon DynamoDB
Amazon DynamoDB
Ruby
Ruby
Node.js
Node.js
AWS Lambda
AWS Lambda
New Relic
New Relic
Amazon Elasticsearch Service
Amazon Elasticsearch Service
Elasticsearch
Elasticsearch
Superset
Superset
Amazon Quicksight
Amazon Quicksight
Amazon Redshift
Amazon Redshift
Zapier
Zapier
Segment
Segment
Amazon CloudFront
Amazon CloudFront
Memcached
Memcached
Amazon ElastiCache
Amazon ElastiCache
Amazon RDS for Aurora
Amazon RDS for Aurora
MySQL
MySQL
Amazon RDS
Amazon RDS
Amazon S3
Amazon S3
Docker
Docker
Capistrano
Capistrano
AWS Elastic Beanstalk
AWS Elastic Beanstalk
Rails API
Rails API
Rails
Rails
Algolia
Algolia

Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

Future improvements / technology decisions included:

Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

See more
Cassandra logo

Cassandra

2K
1.6K
443
2K
1.6K
+ 1
443
A partitioned row store. Rows are organized into tables with a required primary key.
Cassandra logo
Cassandra
VS
Snowflake logo
Snowflake

related Cassandra posts

Thierry Schellenbach
Thierry Schellenbach
CEO at Stream | 17 upvotes 58.4K views
atStreamStream
RocksDB
RocksDB
Cassandra
Cassandra
Redis
Redis
#InMemoryDatabases
#DataStores
#Databases

1.0 of Stream leveraged Cassandra for storing the feed. Cassandra is a common choice for building feeds. Instagram, for instance started, out with Redis but eventually switched to Cassandra to handle their rapid usage growth. Cassandra can handle write heavy workloads very efficiently.

Cassandra is a great tool that allows you to scale write capacity simply by adding more nodes, though it is also very complex. This complexity made it hard to diagnose performance fluctuations. Even though we had years of experience with running Cassandra, it still felt like a bit of a black box. When building Stream 2.0 we decided to go for a different approach and build Keevo. Keevo is our in-house key-value store built upon RocksDB, gRPC and Raft.

RocksDB is a highly performant embeddable database library developed and maintained by Facebook鈥檚 data engineering team. RocksDB started as a fork of Google鈥檚 LevelDB that introduced several performance improvements for SSD. Nowadays RocksDB is a project on its own and is under active development. It is written in C++ and it鈥檚 fast. Have a look at how this benchmark handles 7 million QPS. In terms of technology it鈥檚 much more simple than Cassandra.

This translates into reduced maintenance overhead, improved performance and, most importantly, more consistent performance. It鈥檚 interesting to note that LinkedIn also uses RocksDB for their feed.

#InMemoryDatabases #DataStores #Databases

See more
Linux
Linux
Docker
Docker
jQuery
jQuery
AngularJS
AngularJS
React
React
Cassandra
Cassandra
MongoDB
MongoDB
MySQL
MySQL
Zend Framework
Zend Framework
Laravel
Laravel

React AngularJS jQuery

Laravel Zend Framework

MySQL MongoDB Cassandra

Docker

Linux

See more
Amazon Redshift logo

Amazon Redshift

638
319
86
638
319
+ 1
86
Fast, fully managed, petabyte-scale data warehouse service
Amazon Redshift logo
Amazon Redshift
VS
Snowflake logo
Snowflake

related Amazon Redshift posts

Julien DeFrance
Julien DeFrance
Principal Software Engineer at Tophatter | 16 upvotes 400.6K views
atSmartZipSmartZip
Amazon DynamoDB
Amazon DynamoDB
Ruby
Ruby
Node.js
Node.js
AWS Lambda
AWS Lambda
New Relic
New Relic
Amazon Elasticsearch Service
Amazon Elasticsearch Service
Elasticsearch
Elasticsearch
Superset
Superset
Amazon Quicksight
Amazon Quicksight
Amazon Redshift
Amazon Redshift
Zapier
Zapier
Segment
Segment
Amazon CloudFront
Amazon CloudFront
Memcached
Memcached
Amazon ElastiCache
Amazon ElastiCache
Amazon RDS for Aurora
Amazon RDS for Aurora
MySQL
MySQL
Amazon RDS
Amazon RDS
Amazon S3
Amazon S3
Docker
Docker
Capistrano
Capistrano
AWS Elastic Beanstalk
AWS Elastic Beanstalk
Rails API
Rails API
Rails
Rails
Algolia
Algolia

Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

Future improvements / technology decisions included:

Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

See more
Ankit Sobti
Ankit Sobti
CTO at Postman Inc | 10 upvotes 73.7K views
atPostmanPostman
dbt
dbt
Amazon Redshift
Amazon Redshift
Stitch
Stitch
Looker
Looker

Looker , Stitch , Amazon Redshift , dbt

We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

See more

related Google BigQuery posts

Tim Specht
Tim Specht
鈥嶤o-Founder and CTO at Dubsmash | 14 upvotes 66.8K views
atDubsmashDubsmash
Google BigQuery
Google BigQuery
Amazon SQS
Amazon SQS
AWS Lambda
AWS Lambda
Amazon Kinesis
Amazon Kinesis
Google Analytics
Google Analytics
#ServerlessTaskProcessing
#GeneralAnalytics
#RealTimeDataProcessing
#BigDataAsAService

In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

While this does sound complicated, it鈥檚 as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it鈥檚 available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

#ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

See more
GitHub
GitHub
Google Compute Engine
Google Compute Engine
Google Cloud Storage
Google Cloud Storage
Google BigQuery
Google BigQuery
Google Cloud Bigtable
Google Cloud Bigtable
Google Cloud Run
Google Cloud Run
Google Cloud Build
Google Cloud Build
Google Cloud Deployment Manager
Google Cloud Deployment Manager
Python
Python
Terraform
Terraform
Google Cloud IoT Core
Google Cloud IoT Core

Context: I wanted to create an end to end IoT data pipeline simulation in Google Cloud IoT Core and other GCP services. I never touched Terraform meaningfully until working on this project, and it's one of the best explorations in my development career. The documentation and syntax is incredibly human-readable and friendly. I'm used to building infrastructure through the google apis via Python , but I'm so glad past Sung did not make that decision. I was tempted to use Google Cloud Deployment Manager, but the templates were a bit convoluted by first impression. I'm glad past Sung did not make this decision either.

Solution: Leveraging Google Cloud Build Google Cloud Run Google Cloud Bigtable Google BigQuery Google Cloud Storage Google Compute Engine along with some other fun tools, I can deploy over 40 GCP resources using Terraform!

Check Out My Architecture: CLICK ME

Check out the GitHub repo attached

See more
Stitch logo

Stitch

55
41
9
55
41
+ 1
9
All your data. In your data warehouse. In minutes.
Stitch logo
Stitch
VS
Snowflake logo
Snowflake

related Stitch posts

Ankit Sobti
Ankit Sobti
CTO at Postman Inc | 10 upvotes 73.7K views
atPostmanPostman
dbt
dbt
Amazon Redshift
Amazon Redshift
Stitch
Stitch
Looker
Looker

Looker , Stitch , Amazon Redshift , dbt

We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

See more
Cloudera Enterprise logo

Cloudera Enterprise

49
46
0
49
46
+ 1
0
Enterprise Platform for Big Data
    Be the first to leave a pro
    Cloudera Enterprise logo
    Cloudera Enterprise
    VS
    Snowflake logo
    Snowflake
    Alooma logo

    Alooma

    18
    21
    0
    18
    21
    + 1
    0
    Integrate any data source like databases, applications, and any API - with your own Amazon Redshift
      Be the first to leave a pro
      Alooma logo
      Alooma
      VS
      Snowflake logo
      Snowflake
      Xplenty logo

      Xplenty

      5
      7
      2
      5
      7
      + 1
      2
      Code-free data integration, data transformation and ETL in the cloud
      Xplenty logo
      Xplenty
      VS
      Snowflake logo
      Snowflake
      Azure HDInsight logo

      Azure HDInsight

      5
      2
      0
      5
      2
      + 1
      0
      A cloud-based service from Microsoft for big data analytics
        Be the first to leave a pro
        Azure HDInsight logo
        Azure HDInsight
        VS
        Snowflake logo
        Snowflake
        etleap logo

        etleap

        3
        2
        0
        3
        2
        + 1
        0
        user-friendly, sophisticated ETL-as-a-service on AWS
          Be the first to leave a pro
          etleap logo
          etleap
          VS
          Snowflake logo
          Snowflake
          Matillion logo

          Matillion

          3
          0
          0
          3
          0
          + 1
          0
          An ETL Tool for BigData
            Be the first to leave a pro
            Matillion logo
            Matillion
            VS
            Snowflake logo
            Snowflake
            Dremio logo

            Dremio

            2
            1
            0
            2
            1
            + 1
            0
            Self-service data for everyone
              Be the first to leave a pro
              Dremio logo
              Dremio
              VS
              Snowflake logo
              Snowflake