Alternatives to MySQL logo

Alternatives to MySQL

PostgreSQL, Oracle, MariaDB, MongoDB, and Microsoft SQL Server are the most popular alternatives and competitors to MySQL.
125.2K
3.8K

What is MySQL and what are its top alternatives?

MySQL is a popular open-source relational database management system known for its ease of use, reliability, and scalability. It supports various platforms and has advanced features such as triggers, stored procedures, and views. However, MySQL can sometimes face performance issues with large datasets and complex queries, and it may lack some advanced functionalities compared to other database systems.

  1. PostgreSQL: PostgreSQL is a powerful open-source object-relational database system known for its robustness, extensibility, and standards compliance. It supports a wide range of data types, indexing, and advanced features like full-text search and JSON support. Pros include strict SQL compliance, support for complex queries, and strong data integrity. Cons include potentially slower performance than MySQL in certain scenarios.
  2. MariaDB: MariaDB is a community-developed fork of MySQL designed to maintain compatibility while offering additional features and performance improvements. It provides high availability, scalability, and compatibility with MySQL. Pros include improved performance optimization and additional storage engines. Cons may include potential compatibility issues with MySQL plugins.
  3. SQLite: SQLite is a lightweight, serverless, self-contained database engine that is widely used in embedded systems and mobile applications. It is known for its simplicity, speed, and low memory footprint. Pros include zero-configuration setup, ACID compliance, and compatibility with most programming languages. Cons may include limitations in scalability and concurrent user connections.
  4. Oracle Database: Oracle Database is a commercial, enterprise-grade relational database system known for its high performance, scalability, and security features. It offers advanced functionalities like partitioning, clustering, and advanced analytics. Pros include robust security features, multi-platform support, and comprehensive data management tools. Cons may include high licensing costs and complexity in setup and management.
  5. Microsoft SQL Server: Microsoft SQL Server is a relational database management system developed by Microsoft that offers a comprehensive set of features for data management, analytics, and business intelligence. It supports Windows and Linux platforms and integrates well with Microsoft applications and services. Pros include strong functionality for data analysis and reporting, integrated security features, and support for various programming languages. Cons may include licensing costs for enterprise features.
  6. Amazon Aurora: Amazon Aurora is a fully managed, MySQL-compatible relational database service built for the cloud. It offers high performance, scalability, and availability with compatible features of MySQL. Pros include automatic scaling, fault-tolerance, and compatibility with MySQL tools and applications. Cons may include potential vendor lock-in and pricing based on resource consumption.
  7. CockroachDB: CockroachDB is a distributed SQL database system designed for consistency, scalability, and resilience. It supports ACID transactions, distributed SQL queries, and automatic data replication. Pros include horizontal scalability, high availability, and geo-replication capabilities. Cons may include complexity in setting up and maintaining a distributed system.
  8. Firebase Realtime Database: Firebase Realtime Database is a cloud-hosted NoSQL database that enables real-time synchronization and offline data handling for mobile and web applications. It offers seamless integration with Firebase services and SDKs for quick development of real-time applications. Pros include real-time data synchronization, offline support, and simple JSON-based data structure. Cons may include limitations in querying capabilities and scalability for complex applications.
  9. TimescaleDB: TimescaleDB is an open-source, time-series database extension for PostgreSQL that allows for efficient storage and retrieval of time-series data with SQL capabilities. It offers scalability, compression, and advanced functions for time-series data management. Pros include SQL support for time-series data, advanced aggregation functions, and compatibility with existing PostgreSQL ecosystem. Cons may include a learning curve for optimizing performance with time-series data.
  10. Cassandra: Apache Cassandra is a highly scalable and distributed NoSQL database system designed for handling large volumes of data across multiple nodes and data centers. It offers high availability, fault-tolerance, and linear scalability for write-heavy workloads. Pros include decentralized architecture, high write throughput, and built-in replication for data redundancy. Cons may include complexity in data modeling and query language compared to relational databases.

Top Alternatives to MySQL

  • PostgreSQL
    PostgreSQL

    PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. ...

  • Oracle
    Oracle

    Oracle Database is an RDBMS. An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism is called an object-relational database management system (ORDBMS). Oracle Database has extended the relational model to an object-relational model, making it possible to store complex business models in a relational database. ...

  • MariaDB
    MariaDB

    Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most featureful, stable, and sanely licensed open SQL server in the industry. MariaDB is designed as a drop-in replacement of MySQL(R) with more features, new storage engines, fewer bugs, and better performance. ...

  • MongoDB
    MongoDB

    MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding. ...

  • Microsoft SQL Server
    Microsoft SQL Server

    Microsoft® SQL Server is a database management and analysis system for e-commerce, line-of-business, and data warehousing solutions. ...

  • SQLite
    SQLite

    SQLite is an embedded SQL database engine. Unlike most other SQL databases, SQLite does not have a separate server process. SQLite reads and writes directly to ordinary disk files. A complete SQL database with multiple tables, indices, triggers, and views, is contained in a single disk file. ...

  • Apache Aurora
    Apache Aurora

    Apache Aurora is a service scheduler that runs on top of Mesos, enabling you to run long-running services that take advantage of Mesos' scalability, fault-tolerance, and resource isolation. ...

  • Cassandra
    Cassandra

    Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL. ...

MySQL alternatives & related posts

PostgreSQL logo

PostgreSQL

98.1K
82.1K
3.5K
A powerful, open source object-relational database system
98.1K
82.1K
+ 1
3.5K
PROS OF POSTGRESQL
  • 763
    Relational database
  • 510
    High availability
  • 439
    Enterprise class database
  • 383
    Sql
  • 304
    Sql + nosql
  • 173
    Great community
  • 147
    Easy to setup
  • 131
    Heroku
  • 130
    Secure by default
  • 113
    Postgis
  • 50
    Supports Key-Value
  • 48
    Great JSON support
  • 34
    Cross platform
  • 33
    Extensible
  • 28
    Replication
  • 26
    Triggers
  • 23
    Multiversion concurrency control
  • 23
    Rollback
  • 21
    Open source
  • 18
    Heroku Add-on
  • 17
    Stable, Simple and Good Performance
  • 15
    Powerful
  • 13
    Lets be serious, what other SQL DB would you go for?
  • 11
    Good documentation
  • 9
    Scalable
  • 8
    Free
  • 8
    Reliable
  • 8
    Intelligent optimizer
  • 7
    Transactional DDL
  • 7
    Modern
  • 6
    One stop solution for all things sql no matter the os
  • 5
    Relational database with MVCC
  • 5
    Faster Development
  • 4
    Full-Text Search
  • 4
    Developer friendly
  • 3
    Excellent source code
  • 3
    Free version
  • 3
    Great DB for Transactional system or Application
  • 3
    Relational datanbase
  • 3
    search
  • 3
    Open-source
  • 2
    Text
  • 2
    Full-text
  • 1
    Can handle up to petabytes worth of size
  • 1
    Composability
  • 1
    Multiple procedural languages supported
  • 0
    Native
CONS OF POSTGRESQL
  • 10
    Table/index bloatings

related PostgreSQL posts

Simon Reymann
Senior Fullstack Developer at QUANTUSflow Software GmbH · | 30 upvotes · 11.1M views

Our whole DevOps stack consists of the following tools:

  • GitHub (incl. GitHub Pages/Markdown for Documentation, GettingStarted and HowTo's) for collaborative review and code management tool
  • Respectively Git as revision control system
  • SourceTree as Git GUI
  • Visual Studio Code as IDE
  • CircleCI for continuous integration (automatize development process)
  • Prettier / TSLint / ESLint as code linter
  • SonarQube as quality gate
  • Docker as container management (incl. Docker Compose for multi-container application management)
  • VirtualBox for operating system simulation tests
  • Kubernetes as cluster management for docker containers
  • Heroku for deploying in test environments
  • nginx as web server (preferably used as facade server in production environment)
  • SSLMate (using OpenSSL) for certificate management
  • Amazon EC2 (incl. Amazon S3) for deploying in stage (production-like) and production environments
  • PostgreSQL as preferred database system
  • Redis as preferred in-memory database/store (great for caching)

The main reason we have chosen Kubernetes over Docker Swarm is related to the following artifacts:

  • Key features: Easy and flexible installation, Clear dashboard, Great scaling operations, Monitoring is an integral part, Great load balancing concepts, Monitors the condition and ensures compensation in the event of failure.
  • Applications: An application can be deployed using a combination of pods, deployments, and services (or micro-services).
  • Functionality: Kubernetes as a complex installation and setup process, but it not as limited as Docker Swarm.
  • Monitoring: It supports multiple versions of logging and monitoring when the services are deployed within the cluster (Elasticsearch/Kibana (ELK), Heapster/Grafana, Sysdig cloud integration).
  • Scalability: All-in-one framework for distributed systems.
  • Other Benefits: Kubernetes is backed by the Cloud Native Computing Foundation (CNCF), huge community among container orchestration tools, it is an open source and modular tool that works with any OS.
See more
Jeyabalaji Subramanian

Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

Based on the above criteria, we selected the following tools to perform the end to end data replication:

We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

See more
Oracle logo

Oracle

2.3K
1.7K
113
An RDBMS that implements object-oriented features such as user-defined types, inheritance, and polymorphism
2.3K
1.7K
+ 1
113
PROS OF ORACLE
  • 44
    Reliable
  • 33
    Enterprise
  • 15
    High Availability
  • 5
    Hard to maintain
  • 5
    Expensive
  • 4
    Maintainable
  • 4
    Hard to use
  • 3
    High complexity
CONS OF ORACLE
  • 14
    Expensive

related Oracle posts

Hi. We are planning to develop web, desktop, and mobile app for procurement, logistics, and contracts. Procure to Pay and Source to pay, spend management, supplier management, catalog management. ( similar to SAP Ariba, gap.com, coupa.com, ivalua.com vroozi.com, procurify.com

We got stuck when deciding which technology stack is good for the future. We look forward to your kind guidance that will help us.

We want to integrate with multiple databases with seamless bidirectional integration. What APIs and middleware available are best to achieve this? SAP HANA, Oracle, MySQL, MongoDB...

ASP.NET / Node.js / Laravel. ......?

Please guide us

See more

I recently started a new position as a data scientist at an E-commerce company. The company is founded about 4-5 years ago and is new to many data-related areas. Specifically, I'm their first data science employee. So I have to take care of both data analysis tasks as well as bringing new technologies to the company.

  1. They have used Elasticsearch (and Kibana) to have reporting dashboards on their daily purchases and users interactions on their e-commerce website.

  2. They also use the Oracle database system to keep records of their daily turnovers and lists of their current products, clients, and sellers lists.

  3. They use Data-Warehouse with cockpit 10 for generating reports on different aspects of their business including number 2 in this list.

At the moment, I grab batches of data from their system to perform predictive analytics from data science perspectives. In some cases, I use a static form of data such as monthly turnover, client values, and high-demand products, and run my predictive analysis using Python (VS code). Also, I use Google Datastudio or Google Sheets to present my findings. In other cases, I try to do time-series analysis using offline batches of data extracted from Elastic Search to do user recommendations and user personalization.

I really want to use modern data science tools such as Apache Spark, Google BigQuery, AWS, Azure, or others where they really fit. I think these tools can improve my performance as a data scientist and can provide more continuous analytics of their business interactions. But honestly, I'm not sure where each tool is needed and what part of their system should be replaced by or combined with the current state of technology to improve productivity from the above perspectives.

See more
MariaDB logo

MariaDB

16.3K
12.7K
468
An enhanced, drop-in replacement for MySQL
16.3K
12.7K
+ 1
468
PROS OF MARIADB
  • 149
    Drop-in mysql replacement
  • 100
    Great performance
  • 74
    Open source
  • 55
    Free
  • 44
    Easy setup
  • 15
    Easy and fast
  • 14
    Lead developer is "monty" widenius the founder of mysql
  • 6
    Also an aws rds service
  • 4
    Consistent and robust
  • 4
    Learning curve easy
  • 2
    Native JSON Support / Dynamic Columns
  • 1
    Real Multi Threaded queries on a table/db
CONS OF MARIADB
    Be the first to leave a con

    related MariaDB posts

    Tassanai Singprom

    This is my stack in Application & Data

    JavaScript PHP HTML5 jQuery Redis Amazon EC2 Ubuntu Sass Vue.js Firebase Laravel Lumen Amazon RDS GraphQL MariaDB

    My Utilities Tools

    Google Analytics Postman Elasticsearch

    My Devops Tools

    Git GitHub GitLab npm Visual Studio Code Kibana Sentry BrowserStack

    My Business Tools

    Slack

    See more
    Joshua Dean Küpper
    CEO at Scrayos UG (haftungsbeschränkt) · | 11 upvotes · 675.4K views

    We primarily use MariaDB but use PostgreSQL as a part of GitLab , Sentry and Nextcloud , which (initially) forced us to use it anyways. While this isn't much of a decision – because we didn't have one (ha ha) – we learned to love the perks and advantages of PostgreSQL anyways. PostgreSQL's extension system makes it even more flexible than a lot of the other SQL-based DBs (that only offer stored procedures) and the additional JOIN options, the enhanced role management and the different authentication options came in really handy, when doing manual maintenance on the databases.

    See more
    MongoDB logo

    MongoDB

    93.5K
    80.7K
    4.1K
    The database for giant ideas
    93.5K
    80.7K
    + 1
    4.1K
    PROS OF MONGODB
    • 828
      Document-oriented storage
    • 593
      No sql
    • 553
      Ease of use
    • 464
      Fast
    • 410
      High performance
    • 255
      Free
    • 218
      Open source
    • 180
      Flexible
    • 145
      Replication & high availability
    • 112
      Easy to maintain
    • 42
      Querying
    • 39
      Easy scalability
    • 38
      Auto-sharding
    • 37
      High availability
    • 31
      Map/reduce
    • 27
      Document database
    • 25
      Easy setup
    • 25
      Full index support
    • 16
      Reliable
    • 15
      Fast in-place updates
    • 14
      Agile programming, flexible, fast
    • 12
      No database migrations
    • 8
      Easy integration with Node.Js
    • 8
      Enterprise
    • 6
      Enterprise Support
    • 5
      Great NoSQL DB
    • 4
      Support for many languages through different drivers
    • 3
      Schemaless
    • 3
      Aggregation Framework
    • 3
      Drivers support is good
    • 2
      Fast
    • 2
      Managed service
    • 2
      Easy to Scale
    • 2
      Awesome
    • 2
      Consistent
    • 1
      Good GUI
    • 1
      Acid Compliant
    CONS OF MONGODB
    • 6
      Very slowly for connected models that require joins
    • 3
      Not acid compliant
    • 2
      Proprietary query language

    related MongoDB posts

    Jeyabalaji Subramanian

    Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

    We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

    Based on the above criteria, we selected the following tools to perform the end to end data replication:

    We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

    We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

    In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

    Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

    In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

    See more
    Robert Zuber

    We use MongoDB as our primary #datastore. Mongo's approach to replica sets enables some fantastic patterns for operations like maintenance, backups, and #ETL.

    As we pull #microservices from our #monolith, we are taking the opportunity to build them with their own datastores using PostgreSQL. We also use Redis to cache data we’d never store permanently, and to rate-limit our requests to partners’ APIs (like GitHub).

    When we’re dealing with large blobs of immutable data (logs, artifacts, and test results), we store them in Amazon S3. We handle any side-effects of S3’s eventual consistency model within our own code. This ensures that we deal with user requests correctly while writes are in process.

    See more
    Microsoft SQL Server logo

    Microsoft SQL Server

    19.8K
    15.3K
    540
    A relational database management system developed by Microsoft
    19.8K
    15.3K
    + 1
    540
    PROS OF MICROSOFT SQL SERVER
    • 139
      Reliable and easy to use
    • 101
      High performance
    • 95
      Great with .net
    • 65
      Works well with .net
    • 56
      Easy to maintain
    • 21
      Azure support
    • 17
      Always on
    • 17
      Full Index Support
    • 10
      Enterprise manager is fantastic
    • 9
      In-Memory OLTP Engine
    • 2
      Easy to setup and configure
    • 2
      Security is forefront
    • 1
      Great documentation
    • 1
      Faster Than Oracle
    • 1
      Columnstore indexes
    • 1
      Decent management tools
    • 1
      Docker Delivery
    • 1
      Max numar of connection is 14000
    CONS OF MICROSOFT SQL SERVER
    • 4
      Expensive Licensing
    • 2
      Microsoft
    • 1
      Data pages is only 8k
    • 1
      Allwayon can loose data in asycronious mode
    • 1
      Replication can loose the data
    • 1
      The maximum number of connections is only 14000 connect

    related Microsoft SQL Server posts

    We initially started out with Heroku as our PaaS provider due to a desire to use it by our original developer for our Ruby on Rails application/website at the time. We were finding response times slow, it was painfully slow, sometimes taking 10 seconds to start loading the main page. Moving up to the next "compute" level was going to be very expensive.

    We moved our site over to AWS Elastic Beanstalk , not only did response times on the site practically become instant, our cloud bill for the application was cut in half.

    In database world we are currently using Amazon RDS for PostgreSQL also, we have both MariaDB and Microsoft SQL Server both hosted on Amazon RDS. The plan is to migrate to AWS Aurora Serverless for all 3 of those database systems.

    Additional services we use for our public applications: AWS Lambda, Python, Redis, Memcached, AWS Elastic Load Balancing (ELB), Amazon Elasticsearch Service, Amazon ElastiCache

    See more
    Farzeem Diamond Jiwani
    Software Engineer at IVP · | 8 upvotes · 1.5M views

    Hey there! We are looking at Datadog, Dynatrace, AppDynamics, and New Relic as options for our web application monitoring.

    Current Environment: .NET Core Web app hosted on Microsoft IIS

    Future Environment: Web app will be hosted on Microsoft Azure

    Tech Stacks: IIS, RabbitMQ, Redis, Microsoft SQL Server

    Requirement: Infra Monitoring, APM, Real - User Monitoring (User activity monitoring i.e., time spent on a page, most active page, etc.), Service Tracing, Root Cause Analysis, and Centralized Log Management.

    Please advise on the above. Thanks!

    See more
    SQLite logo

    SQLite

    19K
    15K
    535
    A software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine
    19K
    15K
    + 1
    535
    PROS OF SQLITE
    • 163
      Lightweight
    • 135
      Portable
    • 122
      Simple
    • 81
      Sql
    • 29
      Preinstalled on iOS and Android
    • 2
      Free
    • 2
      Tcl integration
    • 1
      Portable A database on my USB 'love it'
    CONS OF SQLITE
    • 2
      Not for multi-process of multithreaded apps
    • 1
      Needs different binaries for each platform

    related SQLite posts

    Dimelo Waterson
    Shared insights
    on
    PostgreSQLPostgreSQLMySQLMySQLSQLiteSQLite

    I need to add a DBMS to my stack, but I don't know which. I'm tempted to learn SQLite since it would be useful to me with its focus on local access without concurrency. However, doing so feels like I would be defeating the purpose of trying to expand my skill set since it seems like most enterprise applications have the opposite requirements.

    To be able to apply what I learn to more projects, what should I try to learn? MySQL? PostgreSQL? Something else? Is there a comfortable middle ground between high applicability and ease of use?

    See more
    Pran B.
    Fullstack Developer at Growbox · | 6 upvotes · 283.9K views

    Goal/Problem: A small mobile app (using Flutter ) for saving data offline ( some data offline) and rest data need to be synced with Cloud Firestore Tools: Cloud Firestore , SQLite Decision/Considering/Need suggestions: There is no state management in the app yet. There is a requirement to store some data offline and it should be available easily (when the phone is offline) and some data needs to stored in the cloud. I am considering using sqlflite for phone storage and firestore to sync and manage the online database. I am using flutter to build the app, I couldn't find a reliable way to use firestore cache for reading the data when phonphone is offline. So I came up with the above solution. Please suggest is this good?

    See more
    Apache Aurora logo

    Apache Aurora

    69
    96
    0
    An Apcahe Mesos framework for scheduling jobs, originally developed by Twitter
    69
    96
    + 1
    0
    PROS OF APACHE AURORA
      Be the first to leave a pro
      CONS OF APACHE AURORA
        Be the first to leave a con

        related Apache Aurora posts

        Docker containers on Mesos run their microservices with consistent configurations at scale, along with Aurora for long-running services and cron jobs.

        See more
        Cassandra logo

        Cassandra

        3.6K
        3.5K
        507
        A partitioned row store. Rows are organized into tables with a required primary key.
        3.6K
        3.5K
        + 1
        507
        PROS OF CASSANDRA
        • 119
          Distributed
        • 98
          High performance
        • 81
          High availability
        • 74
          Easy scalability
        • 53
          Replication
        • 26
          Reliable
        • 26
          Multi datacenter deployments
        • 10
          Schema optional
        • 9
          OLTP
        • 8
          Open source
        • 2
          Workload separation (via MDC)
        • 1
          Fast
        CONS OF CASSANDRA
        • 3
          Reliability of replication
        • 1
          Size
        • 1
          Updates

        related Cassandra posts

        Thierry Schellenbach
        Shared insights
        on
        RedisRedisCassandraCassandraRocksDBRocksDB
        at

        1.0 of Stream leveraged Cassandra for storing the feed. Cassandra is a common choice for building feeds. Instagram, for instance started, out with Redis but eventually switched to Cassandra to handle their rapid usage growth. Cassandra can handle write heavy workloads very efficiently.

        Cassandra is a great tool that allows you to scale write capacity simply by adding more nodes, though it is also very complex. This complexity made it hard to diagnose performance fluctuations. Even though we had years of experience with running Cassandra, it still felt like a bit of a black box. When building Stream 2.0 we decided to go for a different approach and build Keevo. Keevo is our in-house key-value store built upon RocksDB, gRPC and Raft.

        RocksDB is a highly performant embeddable database library developed and maintained by Facebook’s data engineering team. RocksDB started as a fork of Google’s LevelDB that introduced several performance improvements for SSD. Nowadays RocksDB is a project on its own and is under active development. It is written in C++ and it’s fast. Have a look at how this benchmark handles 7 million QPS. In terms of technology it’s much more simple than Cassandra.

        This translates into reduced maintenance overhead, improved performance and, most importantly, more consistent performance. It’s interesting to note that LinkedIn also uses RocksDB for their feed.

        #InMemoryDatabases #DataStores #Databases

        See more

        Trying to establish a data lake(or maybe puddle) for my org's Data Sharing project. The idea is that outside partners would send cuts of their PHI data, regardless of format/variables/systems, to our Data Team who would then harmonize the data, create data marts, and eventually use it for something. End-to-end, I'm envisioning:

        1. Ingestion->Secure, role-based, self service portal for users to upload data (1a. bonus points if it can preform basic validations/masking)
        2. Storage->Amazon S3 seems like the cheapest. We probably won't need very big, even at full capacity. Our current storage is a secure Box folder that has ~4GB with several batches of test data, code, presentations, and planning docs.
        3. Data Catalog-> AWS Glue? Azure Data Factory? Snowplow? is the main difference basically based on the vendor? We also will have Data Dictionaries/Codebooks from submitters. Where would they fit in?
        4. Partitions-> I've seen Cassandra and YARN mentioned, but have no experience with either
        5. Processing-> We want to use SAS if at all possible. What will work with SAS code?
        6. Pipeline/Automation->The check-in and verification processes that have been outlined are rather involved. Some sort of automated messaging or approval workflow would be nice
        7. I have very little guidance on what a "Data Mart" should look like, so I'm going with the idea that it would be another "experimental" partition. Unless there's an actual mart-building paradigm I've missed?
        8. An end user might use the catalog to pull certain de-identified data sets from the marts. Again, role-based access and self-service gui would be preferable. I'm the only full-time tech person on this project, but I'm mostly an OOP, HTML, JavaScript, and some SQL programmer. Most of this is out of my repertoire. I've done a lot of research, but I can't be an effective evangelist without hands-on experience. Since we're starting a new year of our grant, they've finally decided to let me try some stuff out. Any pointers would be appreciated!
        See more