Citus vs MySQL: What are the differences?
Developers describe Citus as "Worry-free Postgres for SaaS. Built to scale out". Citus is worry-free Postgres for SaaS. Made to scale out, Citus is an extension to Postgres that distributes queries across any number of servers. Citus is available as open source, as on-prem software, and as a fully-managed service. On the other hand, MySQL is detailed as "The world's most popular open source database". The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software.
Citus and MySQL belong to "Databases" category of the tech stack.
"Multi-core Parallel Processing" is the primary reason why developers consider Citus over the competitors, whereas "Sql" was stated as the key factor in picking MySQL.
Citus and MySQL are both open source tools. It seems that MySQL with 3.97K GitHub stars and 1.56K forks on GitHub has more adoption than Citus with 3.64K GitHub stars and 273 GitHub forks.
What is Citus?
What is MySQL?
Need advice about which tool to choose?Ask the StackShare community!
Sign up to add, upvote and see more prosMake informed product decisions
What are the cons of using Citus?
Sign up to get full access to all the companiesMake informed product decisions
Sign up to get full access to all the tool integrationsMake informed product decisions
At Shopify, over the years, we moved from shards to the concept of "pods". A pod is a fully isolated instance of Shopify with its own datastores like MySQL, Redis, Memcached. A pod can be spawned in any region. This approach has helped us eliminate global outages. As of today, we have more than a hundred pods, and since moving to this architecture we haven't had any major outages that affected all of Shopify. An outage today only affects a single pod or region.
As we grew into hundreds of shards and pods, it became clear that we needed a solution to orchestrate those deployments. Today, we use Docker, Kubernetes, and Google Kubernetes Engine to make it easy to bootstrap resources for new Shopify Pods.
PostgreSQL was an easy early decision for the founding team. The relational data model fit the types of analyses they would be doing: filtering, grouping, joining, etc., and it was the database they knew best.
Shortly after adopting PG, they discovered Citus, which is a tool that makes it easy to distribute queries. Although it was a young project and a fork of Postgres at that point, Dan says the team was very available, highly expert, and it wouldn’t be very difficult to move back to PG if they needed to.
The stuff they forked was in query execution. You could treat the worker nodes like regular PG instances. Citus also gave them a ton of flexibility to make queries fast, and again, they felt the data model was the best fit for their application.
At Heap, we searched for an existing tool that would allow us to express the full range of analyses we needed, index the event definitions that made up the analyses, and was a mature, natively distributed system.
After coming up empty on this search, we decided to compromise on the “maturity” requirement and build our own distributed system around Citus and sharded PostgreSQL. It was at this point that we also introduced Kafka as a queueing layer between the Node.js application servers and Postgres.
If we could go back in time, we probably would have started using Kafka on day one. One of the biggest benefits in adopting Kafka has been the peace of mind that it brings. In an analytics infrastructure, it’s often possible to make data ingestion idempotent.
In Heap’s case, that means that, if anything downstream from Kafka goes down, we won’t lose any data – it’s just going to take a bit longer to get to its destination. We also learned that you want the path between data hitting your servers and your initial persistence layer (in this case, Kafka) to be as short and simple as possible, since that is the surface area where a failure means you can lose customer data. We learned that it’s a very good fit for an analytics tool, since you can handle a huge number of incoming writes with relatively low latency. Kafka also gives you the ability to “replay” the data flow: it’s like a commit log for your whole infrastructure.
#MessageQueue #Databases #FrameworksFullStack
The majority of our Clojure microservices are simple web services that wrap a transactional database with CRUD operations and a little bit of business logic. We use both MySQL and PostgreSQL for transactional data persistence, having transitioned from the former to the latter for newer services to take advantage of the new features coming out of the Postgres community.
Most of our Clojure best practices can be summed up by the phrase "keep it simple." We avoid more complex web frameworks in favor of using the Ring library to build web service routes, and we prefer sending SQL directly to the JDBC library rather than using a complicated ORM or SQL DSL.
We went with MongoDB , almost by mistake. I had never used it before, but I knew I wanted the *EAN part of the MEAN stack, so why not go all in. I come from a background of SQL (first MySQL , then PostgreSQL ), so I definitely abused Mongo at first... by trying to turn it into something more relational than it should be. But hey, data is supposed to be relational, so there wasn't really any way to get around that.
There's a lot I love about MongoDB, and a lot I hate. I still don't know if we made the right decision. We've been able to build much quicker, but we also have had some growing pains. We host our databases on MongoDB Atlas , and I can't say enough good things about it. We had tried MongoLab and Compose before it, and with MongoDB Atlas I finally feel like things are in a good place. I don't know if I'd use it for a one-off small project, but for a large product Atlas has given us a ton more control, stability and trust.
Back at the start of 2017, we decided to create a web-based tool for the SEO OnPage analysis of our clients' websites. We had over 2.000 websites to analyze, so we had to perform thousands of requests to get every single page from those websites, process the information and save the big amounts of data somewhere.
Very soon we realized that the initial chosen script language and database, PHP, Laravel and MySQL, was not going to be able to cope efficiently with such a task.
By that time, we were doing some experiments for other projects with a language we had recently get to know, Go , so we decided to get a try and code the crawler using it. It was fantastic, we could process much more data with way less CPU power and in less time. By using the concurrency abilites that the language has to offers, we could also do more Http requests in less time.
Unfortunately, I have no comparison numbers to show about the performance differences between Go and PHP since the difference was so clear from the beginning and that we didn't feel the need to do further comparison tests nor document it. We just switched fully to Go.
There was still a problem: despite the big amount of Data we were generating, MySQL was performing very well, but as we were adding more and more features to the software and with those features more and more different type of data to save, it was a nightmare for the database architects to structure everything correctly on the database, so it was clear what we had to do next: switch to a NoSQL database. So we switched to MongoDB, and it was also fantastic: we were expending almost zero time in thinking how to structure the Database and the performance also seemed to be better, but again, I have no comparison numbers to show due to the lack of time.
As of now, we don't only use the tool intern but we also opened it for everyone to use for free: https://tool-seo.com
We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks.
We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).
And then after that, we've just gotten a ton of value out of postgres. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes.
I can't recommend it highly enough.
Our most popular (& controversial!) article to date on the Uber Engineering blog in 3+ yrs. Why we moved from PostgreSQL to MySQL. In essence, it was due to a variety of limitations of Postgres at the time. Fun fact -- earlier in Uber's history we'd actually moved from MySQL to Postgres before switching back for good, & though we published the article in Summer 2016 we haven't looked back since:
The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL (https://eng.uber.com/schemaless-part-one/). In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL:
I'm the CTO of a marketing automation SaaS. Because of the continuously increasing load we moved to the AWSCloud. We are using more and more features of AWS: Amazon CloudWatch, Amazon SNS, Amazon CloudFront, Amazon Route 53 and so on.
Our main Database is MySQL but for the hundreds of GB document data we use MongoDB more and more. We started to use Redis for cache and other time sensitive operations.
On the front-end we use jQuery UI + Smarty but now we refactor our app to use Vue.js with Vuetify. Because our app is relatively complex we need to use vuex as well.
On the development side we use GitHub as our main repo, Docker for local and server environment and Jenkins and AWS CodePipeline for Continuous Integration.
Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.
I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.
For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.
Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.
Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.
Future improvements / technology decisions included:
Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic
As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.
One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.
Initial storage was traditional MySQL. The pace of changes during a startup mode made it very difficult to have a clean and consistent schema. Large portions ended up as unstructured data stuffed into CLOBs and BLOBs.
Moving to MongoDB definitely made this part much easier.
Accessing data for analysis is a little bit of a challenge - especially for people coming from the world of SQL Workbench. But with tools like Exploratory this is becoming less of a problem.
One of our battles at the very beginning of the road was choosing the right database. In fact, our first prototype was built on MySQL and back then nothing else was even under a consideration (don't ask me why). At some point, I was working on a project which was running on PostgreSQL and it is only then I understood the full power of it. We have over a billion of records in production instance, and we are able to optimize it to run fast and reliable. Well, now my default DB is PostgreSQL :)
Much of our data model is relational, which makes MySQL or PostgreSQL (and family) fit the API's we need to build, in order to meet the needs of our customers.
Sometimes the flexibility of a NoSQL store like Amazon DynamoDB is very useful, but the lack of consistency really impacts usability and performance long-term, compared with viable alternatives. At our current scale, we've seen huge benefits from moving some of our tables out of Dynamo and doing more in SQL.
There will always be use cases for NoSQL and key-values stores, but if your model is well understood in your business/industry: relational databases are the way to go after finding product-market fit. Always understand the trade-offs (and a few intimate details) of any data store before you add to your company's stack!
At uSwitch we use Vault to generate short lived database credentials for our applications running in Kubernetes. We wanted to move from an environment where we had 100 dbs with a variety of static passwords being shared around to a place where each pod would have credentials that only last for its lifetime.
We chose vault because:
It had built in Kubernetes support so we could use service accounts to permission which pods could access which database.
A terraform provider so that we could configure both our RDS instances and their vault configuration in one place.
A variety of database providers including MySQL/PostgreSQL (our most common dbs).
A good api/Go -sdk so that we could build tooling around it to simplify development worfklow.
It had other features we would utilise such as PKI
Hi! I needed to choose a full stack of tools for a web drop shipping site without the payment process for a family startup proyect. It will feed from several web services (JSON), I'm looking forward a 4,200 articles tops. For web use only and for a few clients at the beginning.
I'm considering C# with .NET Core 3.0 as is the one language I'm starting to learn. For the Database I haven´t made my mind yet, but could be MySQL or MongoDB any advice is welcome as I'm getting back to programming after year away from this awesome world. Thanks
We are used MySQL database to build the Online Food Ordering System
- Its best support normalization and all joins ( Restaurant details & Ordering, customer management, food menu, order transaction & food delivery).
- Best for performance and structured the data.
- Its help to stored the instant updates received from food delivery app ( update the real-time driver GPS location).
1.It's very popular. Heared about it in Database class 2. The most comprehensive set of advanced features, management tools and technical support to achieve the highest levels of MySQL scalability, security, reliability, and uptime. 3. MySQL is an open-source relational database management system. Its name is a combination of "My", the name of co-founder Michael Widenius's daughter, and "SQL", the abbreviation for Structured Query Language.
We use MySQL and variants thereof to store the data for our projects such as the community. MySQL being a well established product means that support is available whenever it is required along with an extensive list of support articles all over the web for diagnosing issues. Variants are also used where needed when, for example, better performance is needed.
MySQL is a freely available open source Relational Database Management System (RDBMS) that uses Structured Query Language (SQL). SQL is the most popular language for adding, accessing and managing content in a database. It is most noted for its quick processing, proven reliability, ease and flexibility of use.