What is MySQL?
What is PostgreSQL?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
In 2014, Uber began working on Big Data Generation 1. Prior to 2014, all of the company’s data could fit into “a few traditional online transaction processing (OLTP) databases (in [their] case, MySQL and PostgreSQL ).” At that time, engineers were required to write individual code to access and join data across databases, a take made more difficult by the fact that there was no global view of all stored data, which was scattered around several OLTP databases with a “total data size on the order of a few terabytes.”
They chose Vertica as the analytical data warehouse, because of “because of its fast, scalable, and column-oriented design. We also developed multiple ad hoc ETL (Extract, Transform, and Load) jobs that copied data from different sources (i.e. AWS S3, OLTP databases, service logs, etc.) into Vertica.
Uber's early architecture was a monolithic Python backend on top of Postgres. Between 2014 and 2016, Uber moved many applications off of Postgres and onto MySQL, accessed through a sharding layer built in-house called Schemaless.
As the company scaled, Uber encountered issues with scaling writes due to write amplification issues (1 logical write causing 4 physical writes) and inefficient data replication with Postgres. Postgres replication works at the physical level, which means that it's not possible to replicate data between data bases running different versions. This added complexity and risk to upgrades.
MySQL, and the InnoDB storage engine in particular, can do replication at the statement level, which poses less risks and works between versions. Uber also found MySQL's caching and connection pooling capabilities to provide superior performance for their use cases.
The majority of our Clojure microservices are simple web services that wrap a transactional database with CRUD operations and a little bit of business logic. We use both MySQL and PostgreSQL for transactional data persistence, having transitioned from the former to the latter for newer services to take advantage of the new features coming out of the Postgres community.
Most of our Clojure best practices can be summed up by the phrase "keep it simple." We avoid more complex web frameworks in favor of using the Ring library to build web service routes, and we prefer sending SQL directly to the JDBC library rather than using a complicated ORM or SQL DSL.
We went with MongoDB , almost by mistake. I had never used it before, but I knew I wanted the *EAN part of the MEAN stack, so why not go all in. I come from a background of SQL (first MySQL , then PostgreSQL ), so I definitely abused Mongo at first... by trying to turn it into something more relational than it should be. But hey, data is supposed to be relational, so there wasn't really any way to get around that.
There's a lot I love about MongoDB, and a lot I hate. I still don't know if we made the right decision. We've been able to build much quicker, but we also have had some growing pains. We host our databases on MongoDB Atlas , and I can't say enough good things about it. We had tried MongoLab and Compose before it, and with MongoDB Atlas I finally feel like things are in a good place. I don't know if I'd use it for a one-off small project, but for a large product Atlas has given us a ton more control, stability and trust.
In my opinion PostgreSQL is totally over MongoDB - not only works with structured data & SQL & strict types, but also has excellent support for unstructured data as separate data type (you can store arbitrary JSONs - and they may be also queryable, depending on one of format's you may choose). Both writes & reads are much faster, then in Mongo. So you can get best on Document NoSQL & SQL in single database..
Formal downside of PostgreSQL is clustering scalability. There's not simple way to build distributed a cluster. However, two points:
1) You will need much more time before you need to actually scale due to PG's efficiency. And if you follow database-per-service pattern, maybe you won't need ever, cause dealing few billion records on single machine is an option for PG.
2) When you need to - you do it in a way you need, including as a part of app's logic (e.g. sharding by key, or PG-based clustering solution with strict model), scalability will be very transparent, much more obvious than Mongo's "cluster just works (but then fails)" replication.
We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks.
We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).
And then after that, we've just gotten a ton of value out of postgres. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes.
I can't recommend it highly enough.
Our most popular (& controversial!) article to date on the Uber Engineering blog in 3+ yrs. Why we moved from PostgreSQL to MySQL. In essence, it was due to a variety of limitations of Postgres at the time. Fun fact -- earlier in Uber's history we'd actually moved from MySQL to Postgres before switching back for good, & though we published the article in Summer 2016 we haven't looked back since:
The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL (https://eng.uber.com/schemaless-part-one/). In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL:
One of our battles at the very beginning of the road was choosing the right database. In fact, our first prototype was built on MySQL and back then nothing else was even under a consideration (don't ask me why). At some point, I was working on a project which was running on PostgreSQL and it is only then I understood the full power of it. We have over a billion of records in production instance, and we are able to optimize it to run fast and reliable. Well, now my default DB is PostgreSQL :)
Much of our data model is relational, which makes MySQL or PostgreSQL (and family) fit the API's we need to build, in order to meet the needs of our customers.
Sometimes the flexibility of a NoSQL store like Amazon DynamoDB is very useful, but the lack of consistency really impacts usability and performance long-term, compared with viable alternatives. At our current scale, we've seen huge benefits from moving some of our tables out of Dynamo and doing more in SQL.
There will always be use cases for NoSQL and key-values stores, but if your model is well understood in your business/industry: relational databases are the way to go after finding product-market fit. Always understand the trade-offs (and a few intimate details) of any data store before you add to your company's stack!