Alternatives to Snowflake logo

Alternatives to Snowflake

MySQL, Cassandra, Databricks, Hadoop, and Oracle are the most popular alternatives and competitors to Snowflake.
654
782
+ 1
16

What is Snowflake and what are its top alternatives?

Snowflake eliminates the administration and management demands of traditional data warehouses and big data platforms. Snowflake is a true data warehouse as a service running on Amazon Web Services (AWS)—no infrastructure to manage and no knobs to turn.
Snowflake is a tool in the Big Data as a Service category of a tech stack.

Top Alternatives to Snowflake

  • MySQL

    MySQL

    The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software. ...

  • Cassandra

    Cassandra

    Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL. ...

  • Databricks

    Databricks

    Databricks Unified Analytics Platform, from the original creators of Apache Spark™, unifies data science and engineering across the Machine Learning lifecycle from data preparation to experimentation and deployment of ML applications. ...

  • Hadoop

    Hadoop

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. ...

  • Oracle

    Oracle

    Oracle Database is an RDBMS. An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism is called an object-relational database management system (ORDBMS). Oracle Database has extended the relational model to an object-relational model, making it possible to store complex business models in a relational database. ...

  • Amazon Redshift

    Amazon Redshift

    It is optimized for data sets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions. ...

  • Google BigQuery

    Google BigQuery

    Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure. Load data with ease. Bulk load your data using Google Cloud Storage or stream it in. Easy access. Access BigQuery by using a browser tool, a command-line tool, or by making calls to the BigQuery REST API with client libraries such as Java, PHP or Python. ...

  • Amazon EMR

    Amazon EMR

    It is used in a variety of applications, including log analysis, data warehousing, machine learning, financial analysis, scientific simulation, and bioinformatics. ...

Snowflake alternatives & related posts

MySQL logo

MySQL

85.6K
69.6K
3.7K
The world's most popular open source database
85.6K
69.6K
+ 1
3.7K
PROS OF MYSQL
  • 793
    Sql
  • 673
    Free
  • 556
    Easy
  • 527
    Widely used
  • 485
    Open source
  • 180
    High availability
  • 160
    Cross-platform support
  • 104
    Great community
  • 78
    Secure
  • 75
    Full-text indexing and searching
  • 25
    Fast, open, available
  • 14
    SSL support
  • 13
    Reliable
  • 13
    Robust
  • 8
    Enterprise Version
  • 7
    Easy to set up on all platforms
  • 2
    NoSQL access to JSON data type
  • 1
    Relational database
  • 1
    Easy, light, scalable
  • 1
    Sequel Pro (best SQL GUI)
  • 1
    Replica Support
CONS OF MYSQL
  • 14
    Owned by a company with their own agenda
  • 1
    Can't roll back schema changes

related MySQL posts

Tim Abbott

We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks.

We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).

And then after that, we've just gotten a ton of value out of postgres. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes.

I can't recommend it highly enough.

See more
Conor Myhrvold
Tech Brand Mgr, Office of CTO at Uber · | 21 upvotes · 1.1M views

Our most popular (& controversial!) article to date on the Uber Engineering blog in 3+ yrs. Why we moved from PostgreSQL to MySQL. In essence, it was due to a variety of limitations of Postgres at the time. Fun fact -- earlier in Uber's history we'd actually moved from MySQL to Postgres before switching back for good, & though we published the article in Summer 2016 we haven't looked back since:

The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL (https://eng.uber.com/schemaless-part-one/). In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL:

https://eng.uber.com/mysql-migration/

See more
Cassandra logo

Cassandra

3.2K
3.1K
492
A partitioned row store. Rows are organized into tables with a required primary key.
3.2K
3.1K
+ 1
492
PROS OF CASSANDRA
  • 114
    Distributed
  • 95
    High performance
  • 80
    High availability
  • 74
    Easy scalability
  • 52
    Replication
  • 26
    Multi datacenter deployments
  • 26
    Reliable
  • 8
    OLTP
  • 7
    Open source
  • 7
    Schema optional
  • 2
    Workload separation (via MDC)
  • 1
    Fast
CONS OF CASSANDRA
  • 2
    Reliability of replication
  • 1
    Updates

related Cassandra posts

Thierry Schellenbach
Shared insights
on
RedisRedisCassandraCassandraRocksDBRocksDB
at

1.0 of Stream leveraged Cassandra for storing the feed. Cassandra is a common choice for building feeds. Instagram, for instance started, out with Redis but eventually switched to Cassandra to handle their rapid usage growth. Cassandra can handle write heavy workloads very efficiently.

Cassandra is a great tool that allows you to scale write capacity simply by adding more nodes, though it is also very complex. This complexity made it hard to diagnose performance fluctuations. Even though we had years of experience with running Cassandra, it still felt like a bit of a black box. When building Stream 2.0 we decided to go for a different approach and build Keevo. Keevo is our in-house key-value store built upon RocksDB, gRPC and Raft.

RocksDB is a highly performant embeddable database library developed and maintained by Facebook’s data engineering team. RocksDB started as a fork of Google’s LevelDB that introduced several performance improvements for SSD. Nowadays RocksDB is a project on its own and is under active development. It is written in C++ and it’s fast. Have a look at how this benchmark handles 7 million QPS. In terms of technology it’s much more simple than Cassandra.

This translates into reduced maintenance overhead, improved performance and, most importantly, more consistent performance. It’s interesting to note that LinkedIn also uses RocksDB for their feed.

#InMemoryDatabases #DataStores #Databases

See more
Umair Iftikhar
Technical Architect at Vappar · | 3 upvotes · 142.5K views

Developing a solution that collects Telemetry Data from different devices, nearly 1000 devices minimum and maximum 12000. Each device is sending 2 packets in 1 second. This is time-series data, and this data definition and different reports are saved on PostgreSQL. Like Building information, maintenance records, etc. I want to know about the best solution. This data is required for Math and ML to run different algorithms. Also, data is raw without definitions and information stored in PostgreSQL. Initially, I went with TimescaleDB due to PostgreSQL support, but to increase in sites, I started facing many issues with timescale DB in terms of flexibility of storing data.

My major requirement is also the replication of the database for reporting and different purposes. You may also suggest other options other than Druid and Cassandra. But an open source solution is appreciated.

See more
Databricks logo

Databricks

292
478
8
A unified analytics platform, powered by Apache Spark
292
478
+ 1
8
PROS OF DATABRICKS
  • 1
    Best Performances on large datasets
  • 1
    True lakehouse architecture
  • 1
    Scalability
  • 1
    Databricks doesn't get access to your data
  • 1
    Usage Based Billing
  • 1
    Security
  • 1
    Data stays in your cloud account
  • 1
    Multicloud
CONS OF DATABRICKS
    Be the first to leave a con

    related Databricks posts

    Hadoop logo

    Hadoop

    2K
    2K
    55
    Open-source software for reliable, scalable, distributed computing
    2K
    2K
    + 1
    55
    PROS OF HADOOP
    • 38
      Great ecosystem
    • 11
      One stack to rule them all
    • 4
      Great load balancer
    • 1
      Amazon aws
    • 1
      Java syntax
    CONS OF HADOOP
      Be the first to leave a con

      related Hadoop posts

      Conor Myhrvold
      Tech Brand Mgr, Office of CTO at Uber · | 7 upvotes · 1M views

      Why we built Marmaray, an open source generic data ingestion and dispersal framework and library for Apache Hadoop :

      Built and designed by our Hadoop Platform team, Marmaray is a plug-in-based framework built on top of the Hadoop ecosystem. Users can add support to ingest data from any source and disperse to any sink leveraging the use of Apache Spark . The name, Marmaray, comes from a tunnel in Turkey connecting Europe and Asia. Similarly, we envisioned Marmaray within Uber as a pipeline connecting data from any source to any sink depending on customer preference:

      https://eng.uber.com/marmaray-hadoop-ingestion-open-source/

      (Direct GitHub repo: https://github.com/uber/marmaray Kafka Kafka Manager )

      See more
      Shared insights
      on
      KafkaKafkaHadoopHadoop
      at

      The early data ingestion pipeline at Pinterest used Kafka as the central message transporter, with the app servers writing messages directly to Kafka, which then uploaded log files to S3.

      For databases, a custom Hadoop streamer pulled database data and wrote it to S3.

      Challenges cited for this infrastructure included high operational overhead, as well as potential data loss occurring when Kafka broker outages led to an overflow of in-memory message buffering.

      See more
      Oracle logo

      Oracle

      1.6K
      1.3K
      108
      An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism
      1.6K
      1.3K
      + 1
      108
      PROS OF ORACLE
      • 42
        Reliable
      • 31
        Enterprise
      • 15
        High Availability
      • 5
        Expensive
      • 5
        Hard to maintain
      • 4
        Maintainable
      • 3
        Hard to use
      • 3
        High complexity
      CONS OF ORACLE
      • 13
        Expensive

      related Oracle posts

      Hi. We are planning to develop web, desktop, and mobile app for procurement, logistics, and contracts. Procure to Pay and Source to pay, spend management, supplier management, catalog management. ( similar to SAP Ariba, gap.com, coupa.com, ivalua.com vroozi.com, procurify.com

      We got stuck when deciding which technology stack is good for the future. We look forward to your kind guidance that will help us.

      We want to integrate with multiple databases with seamless bidirectional integration. What APIs and middleware available are best to achieve this? SAP HANA, Oracle, MySQL, MongoDB...

      ASP.NET / Node.js / Laravel. ......?

      Please guide us

      See more
      Amazon Redshift logo

      Amazon Redshift

      1.3K
      1.1K
      103
      Fast, fully managed, petabyte-scale data warehouse service
      1.3K
      1.1K
      + 1
      103
      PROS OF AMAZON REDSHIFT
      • 37
        Data Warehousing
      • 27
        Scalable
      • 16
        SQL
      • 14
        Backed by Amazon
      • 5
        Encryption
      • 1
        Cheap and reliable
      • 1
        Isolation
      • 1
        Best Cloud DW Performance
      • 1
        Fast columnar storage
      CONS OF AMAZON REDSHIFT
        Be the first to leave a con

        related Amazon Redshift posts

        Julien DeFrance
        Principal Software Engineer at Tophatter · | 16 upvotes · 2.4M views

        Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

        I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

        For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

        Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

        Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

        Future improvements / technology decisions included:

        Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

        As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

        One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

        See more
        Ankit Sobti

        Looker , Stitch , Amazon Redshift , dbt

        We recently moved our Data Analytics and Business Intelligence tooling to Looker . It's already helping us create a solid process for reusable SQL-based data modeling, with consistent definitions across the entire organizations. Looker allows us to collaboratively build these version-controlled models and push the limits of what we've traditionally been able to accomplish with analytics with a lean team.

        For Data Engineering, we're in the process of moving from maintaining our own ETL pipelines on AWS to a managed ELT system on Stitch. We're also evaluating the command line tool, dbt to manage data transformations. Our hope is that Stitch + dbt will streamline the ELT bit, allowing us to focus our energies on analyzing data, rather than managing it.

        See more
        Google BigQuery logo

        Google BigQuery

        1.2K
        1.1K
        146
        Analyze terabytes of data in seconds
        1.2K
        1.1K
        + 1
        146
        PROS OF GOOGLE BIGQUERY
        • 27
          High Performance
        • 24
          Easy to use
        • 21
          Fully managed service
        • 19
          Cheap Pricing
        • 16
          Process hundreds of GB in seconds
        • 11
          Full table scans in seconds, no indexes needed
        • 11
          Big Data
        • 8
          Always on, no per-hour costs
        • 5
          Good combination with fluentd
        • 4
          Machine learning
        CONS OF GOOGLE BIGQUERY
        • 1
          You can't unit test changes in BQ data

        related Google BigQuery posts

        Context: I wanted to create an end to end IoT data pipeline simulation in Google Cloud IoT Core and other GCP services. I never touched Terraform meaningfully until working on this project, and it's one of the best explorations in my development career. The documentation and syntax is incredibly human-readable and friendly. I'm used to building infrastructure through the google apis via Python , but I'm so glad past Sung did not make that decision. I was tempted to use Google Cloud Deployment Manager, but the templates were a bit convoluted by first impression. I'm glad past Sung did not make this decision either.

        Solution: Leveraging Google Cloud Build Google Cloud Run Google Cloud Bigtable Google BigQuery Google Cloud Storage Google Compute Engine along with some other fun tools, I can deploy over 40 GCP resources using Terraform!

        Check Out My Architecture: CLICK ME

        Check out the GitHub repo attached

        See more
        Tim Specht
        ‎Co-Founder and CTO at Dubsmash · | 14 upvotes · 622.9K views

        In order to accurately measure & track user behaviour on our platform we moved over quickly from the initial solution using Google Analytics to a custom-built one due to resource & pricing concerns we had.

        While this does sound complicated, it’s as easy as clients sending JSON blobs of events to Amazon Kinesis from where we use AWS Lambda & Amazon SQS to batch and process incoming events and then ingest them into Google BigQuery. Once events are stored in BigQuery (which usually only takes a second from the time the client sends the data until it’s available), we can use almost-standard-SQL to simply query for data while Google makes sure that, even with terabytes of data being scanned, query times stay in the range of seconds rather than hours. Before ingesting their data into the pipeline, our mobile clients are aggregating events internally and, once a certain threshold is reached or the app is going to the background, sending the events as a JSON blob into the stream.

        In the past we had workers running that continuously read from the stream and would validate and post-process the data and then enqueue them for other workers to write them to BigQuery. We went ahead and implemented the Lambda-based approach in such a way that Lambda functions would automatically be triggered for incoming records, pre-aggregate events, and write them back to SQS, from which we then read them, and persist the events to BigQuery. While this approach had a couple of bumps on the road, like re-triggering functions asynchronously to keep up with the stream and proper batch sizes, we finally managed to get it running in a reliable way and are very happy with this solution today.

        #ServerlessTaskProcessing #GeneralAnalytics #RealTimeDataProcessing #BigDataAsAService

        See more
        Amazon EMR logo

        Amazon EMR

        485
        558
        54
        Distribute your data and processing across a Amazon EC2 instances using Hadoop
        485
        558
        + 1
        54
        PROS OF AMAZON EMR
        • 15
          On demand processing power
        • 12
          Don't need to maintain Hadoop Cluster yourself
        • 7
          Hadoop Tools
        • 6
          Elastic
        • 4
          Backed by Amazon
        • 3
          Flexible
        • 3
          Economic - pay as you go, easy to use CLI and SDKs
        • 2
          Don't need a dedicated Ops group
        • 1
          Great support
        • 1
          Massive data handling
        CONS OF AMAZON EMR
          Be the first to leave a con

          related Amazon EMR posts