Need advice about which tool to choose?Ask the StackShare community!
Kafka vs MySQL: What are the differences?
Introduction
MySQL and Kafka are both popular technologies used for data storage and processing, but they have key differences that make them suited for different purposes.
- Data Structure and Model:
MySQL is a relational database management system (RDBMS) that follows a structured data model, with data organized into tables, rows, and columns. It enforces strict schema and constraints on the data, ensuring data consistency and integrity. On the other hand, Kafka is a distributed streaming platform that follows a publish-subscribe model. It stores and processes streams of records, treating data as an append-only log with no specific schema or structure.
- Scalability and Performance:
MySQL can scale vertically by adding more resources to a single server, such as CPU and memory. It supports indexing and caching mechanisms to optimize query performance. However, it can face limitations in terms of scalability due to the constraints of a single server. Kafka, on the other hand, is designed for horizontal scalability. It uses a distributed architecture that allows for scaling across multiple servers, making it highly scalable and capable of handling high volumes of data and concurrent operations.
- Data Processing Paradigm:
MySQL primarily focuses on transactional processing, providing ACID (Atomicity, Consistency, Isolation, Durability) properties. It is suitable for use cases that require strong data consistency and integrity, such as financial applications. Kafka, on the other hand, is designed for real-time stream processing. It emphasizes on event-driven and data-intensive applications, enabling high-throughput, low-latency data processing and analysis.
- Data Persistence and Storage:
MySQL stores data persistently on disk and provides various storage engines, such as InnoDB and MyISAM, that offer different trade-offs in terms of performance and features. It supports both structured and unstructured data types. On the other hand, Kafka stores data in a distributed manner, leveraging the disk and memory of multiple servers in the cluster. It provides fault-tolerance and durability by replicating data across different brokers.
- Data Integration and Ecosystem:
MySQL has extensive support for SQL and provides connectors and drivers for various programming languages. It integrates well with other systems through ETL (Extract, Transform, Load) processes and can be used in a wide range of applications. Kafka, on the other hand, has a rich ecosystem and supports integration with various tools and frameworks, such as Apache Spark and Apache Flink, for real-time data processing and analytics. It can serve as a central data pipeline for collecting, streaming, and integrating data from multiple sources.
- Use Cases and Application Scenarios:
MySQL is commonly used for traditional OLTP (Online Transaction Processing) applications, where data consistency and reliability are crucial. It is suitable for applications that require complex querying, joins, and transactions. Kafka is more commonly used for stream processing, event sourcing, and real-time analytics. It excels in use cases that involve handling large volumes of data, processing data in real-time, and building scalable data pipelines.
In summary, MySQL and Kafka have significant differences in data structure, scalability, data processing paradigm, data persistence, integration, and application scenarios. MySQL is a relational database suited for transactional processing, while Kafka is a distributed stream processing platform focused on real-time data processing and analysis.
I need to add a DBMS to my stack, but I don't know which. I'm tempted to learn SQLite since it would be useful to me with its focus on local access without concurrency. However, doing so feels like I would be defeating the purpose of trying to expand my skill set since it seems like most enterprise applications have the opposite requirements.
To be able to apply what I learn to more projects, what should I try to learn? MySQL? PostgreSQL? Something else? Is there a comfortable middle ground between high applicability and ease of use?
You can easily start with SQlite. Really easy to startup since it doesn't require you to install any additional software since is self-contained. It has interfaces in almost any language and also GUIs. Start learning SQL basics and simpler data models and structures. There are many tutorials, also available in the official website. From there you will easily migrate to another database. MySQL could be next, sonce it's easier to learn at first and has more resources available. PostgreSQL is less widespread, more challenging and has the fewer resorces, but once you have some experience with MySQL is really easy to learn as well. All these technologies are really widespread and used accross the industry so you won't make a wrong decision with any of these.
A question you might want to think about is "What kind of experience do I want to gain, by using a DBMS?". If your aim is to have experience with SQL and any related libraries and frameworks for your language of choice (python, I think?), then it kind of doesn't matter too much which you pick so much. As others have said, SQLite would offer you the ability to very easily get started, and would give you a reasonably standard (if a little basic) SQL dialect to work with.
If your aim is actually to have a bit of "operational" experience, in terms of things like what command line tools might be available as standard for the DBMS, understanding how the DBMS handles multiple databases, when to use multiple schemas vs multiple databases, some basic privilege management etc. Then I would recommend PostgreSQL. SQLite's simplicity actually avoids most of these experiences, which is not helpful to you if that is what you hope to learn. MySQL has a few "quirks" to how it manages things like multiple databases, which may lead you to making less good decisions if you tried to take your experience over to different DBMS, especially in bigger enterprise roles. PostgreSQL is kind of a happy middle ground here, with the ability to start PostgreSQL servers via docker or docker-compose making the actual day-to-day management pretty easy, while still giving you experience of the kinds of considerations I have listed above.
At Vital Beats we make use of PostgreSQL, largely because it offers us a happy balance between good management and backup of data, and good standard command line tools, which is essential for us where we are deploying our solutions within Kubernetes / docker, and so more graphical tools are not always appropriate for us. PostgreSQL is also pretty universally supported in terms of language libraries and frameworks, without having to make compromises on how we want to store and layout our data.
MySQL's very popular, easy to install, is also available as a managed service across most popular cloud offerings. The support/default tooling (such as MySQL Query Workbench) certainly is a little more baked than what you'll find for Postgres.
Hello, I am developing a new project with an internal chat between users. Also, there are complex relationships between the other project entities but I wolud like to build something scalable and fast and right now I am designing the data model. What kind of database would you recommend me to manage all application data? relational like MySQL, no relational like MongoDB or a mixed one? Thank you
In MongoDB, a write operation is atomic on the level of a single document, so it's harder to deal with consistency without transactions.
If you are trying with "complex relationships", give a chance to learn ArangoDB and Graph databases. Its database structures allow doing this with faster and simpler queries. The database is not as strict as others and allows arbitrary data. The data model is really like a neural network and you will never need foreign keys tables anymore. In Udemy there is a free course about it to get started.
MongoDB supports horizontal scaling through Sharding , distributing data across several machines and facilitating high throughput operations with large sets of data. ... Sharding allows you to add additional instances to increase capacity when required
The most important question is where are you planning to host? On-premise, or in the cloud.
Particularly if you are planning to host in either AWS or Azure, then your first point of call should be the PaaS (Platform as a Service) databases supplied by these vendors, as you will find yourself requiring a lot less effort to support them, much easier Disaster Recovery options, and also, depending on how PAYG the database is that you use, potentially also much cheaper costs than having a dedicated database server.
Your question regards 'Relational or not' is obviously key, and you need to consider both your required data structure, as well as the ACID requirements of your application model, as well as the non-functional requirements in terms of scalability, resilience, whether you want security authorisation at the highest application tier, or right down to 'row' level in the database, etc. - however please don't fall into the trap of considering 'NoSQL' as being single category. MongoDB, with its document-store type solution is a very different model to key-value-pair stores (like AWS DynamoDB), or column stores (like AWS RedShift) or for more complex data relationships, Entity Graph Stores (like AWS Neptune), to stores designed for tokenisation and text search (ElasticSearch) etc.
Also critical in all this is how many items you believe you need to index by. RDBMS/SQL stores are great for having as many indexes as you want, other than the slow-down in write speed, whereas databases like Amazon DynamoDB provide blisteringly fast read/write performance, but are very limited on key indexing capabilities.
It feels like you have most experience with SQL/RDBMS technologies, so for the simplest learning curve, and if your application fits it, then I'd personally start by looking at AWS Aurora https://aws.amazon.com/rds/aurora/ .
FIrstly, it may help if you explain what you mean by "complex relationships between project entities". Secondly, you can build a fast and scalable solution using either. With that said however, the data sounds relational so I would recommend MySQL.
I think, Its depend of your project type and your skills. MySQL is good and simple for maintenance but MongoDB need more skills and knowledge. If you work on little project, use MySQL. For your project type, MySQL is enough after you can migrate with PostgreSQL
I am going to work on a real estate project and have to decide on a database. Now, SQL databases can be very efficient if appropriately designed. More relations between the data and less redundancy. But with a #NoSQL database, the development time is reduced, and it is easy to query. Since this is my first time working on the real estate domain, I would like to pick a database that would be efficient in the long run.
I recommend PostgreSQL as it’s the most powerful out of the 3 databases you mentioned. It supports JSON objects so you can mimic the MongoDB functionality, but I would also argue that SQL is actually quite powerful and in many cases significantly easier to work with than with NoSQL databases.
Stay away from foreign keys, keep it fast and simple. Define your data structures well in advance. Try to model your data structures based on your system’s vision; based on where it’s going and not based solely on what you currently need it to do. This will help you avoid drastic changes to your database after your system is launched. Populate the database with fake data and run tests. PostgreSQL allows you to create Views from multiple tables. Try to create those views and make sure you can easily create useful views from multiple tables. Run an Explain on those view queries to make sure you created your indexes correctly. Make sure it’s fast!
Any of those three databases are going to be efficient, scalable, and reliable in the long term if you configure and use them correctly. They all also have solid hosting solutions.
All things being equal, I would agree with other posters that Postgres is my preference among the three, but there are caveats.
MongoDB and MySQL have better support for mutli-region replication in your big three cloud environments. Azure recently bought Citus Data, which was a best-in-class Postgres replication solution, so they might be the only one I trust to provide cross-region replication at the moment.
If you have a single region deployment and are on AWS, I can't recommend Aurora Postgres highly enough. It's a very good implementation and extremely performant.
That really depends of where do you see you application in the long run. On any application, any of those choices are excellent. You could argue about good support on JSON binaries, but even MySQL has an excellent support for that on the latest versions.
On the long run, when your application gets hundreds of thousands of requests per second, you might start thinking about how many inputs you will have in the database compared to the outputs. PostgresSQL it’s excellent at giving you outputs, but table corruption can happen when you start receiving this massive number of inputs (Which was the reason Uber switched from Postgres to MySQL)
On our OPS Platform at CTO.ai , we decided to use Postgres, because we need a reliable and agile way to send the output to our users, so that was out best choice in the long run for our product.
I'll second another piece of advice. Postgresql's JSON columns are a dream when it comes to productivity and I use them frequently with our Rails application. In these cases, no migration is required to change schema. We store payloads with dozens or hundreds of keys and performance has not been an issue. We also have a lot of relational tables, so the joins we get with SQL are very important to us and hard to replicate with a NoQL solution.
We are planning to migrate one of my applications from MSSQL to MySQL. Can someone help me with the version to select?. I have a strong inclination towards MySql 5.7. But, I see there are some standout features added in Mysql 8.0 like JSON_TABLE. Just wanted to know if the newer version has not compromised on its speed while giving out some add on features.
MySQL AB doesn't implement anything in MySQL until they can find a way to do it efficiently and, often, more efficiently than other systems. So although I don't have experience with benchmarking JSON_TABLEs or similar new features, their development philosophy alone suggests that version 8 for the latest features would be a safe jump without sacrificing system performance.
MySQL 8.0 is significantly better than MySQL 5.7. For all InnoDB row operations, you'll see a great performance improvement. Also, the time taken to process transactions is lower in MySQL 8.0. Moreover, there has been an improvement in managing read and read/write workloads.
Hello,
I am trying to design an online ordering app similar to Doordash or Uber Eats. I'm having a hard time trying to finalise on what database (or mixture of databases) to use. I'm leaning towards using a relational database like MySQL or PostgreSQL. But, when the application grows, I don't want to join on 20 tables to get a data. Any help would be greatly appreciated. Thank you for your time.
Hello Suhas , We build our product www.voilacabs.com which is in the same lines as yours but we have used a combination of Mysql and MongoDB. When using MySQL, i would recommend doing the following: 1. Use Mysql only for storage only and for realtime updates we recommend MongoDB. 2. Don't try to Join more than 3 tables. ( the moment you reach 3 join stop there and try to un-normalized database. 3. Never or very rarely use Auto-increments. ( we recommend using UUIDS ) . Use UUIDS always for Auto increments for MYSQL. If you using Postgre SQL then i would suggest you to please check this https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c There is a stored procedure that generated unique keys instead of auto-increment keys and that will help you sharding or clustering database without sync errors. 4. Also For MongoDB if you can put a layer of REDIS Cache then that will boost your api performance under large loads. 5. Use Node.js programing language as that function asynchronously .
Let me know if you still need any suggestion's . Thanks & Regards Rupen Makhecha CTO @ Voila Cab's www.voilacabs.com
I would recommend a mixture of MySQL and MongoDB. Using MongoDB for the Content Distribution Network (CDN) will make it easy to store high volume incoming data. MySQL is recommended to be used for business logic. PostgreSQL is not recommended since you will be faced with inefficient database replication features and constant migration from one PostgreSQL version to another.
I asked my last question incorrectly. Rephrasing it here.
I am looking for the most secure open source database for my project I'm starting: https://github.com/SuPragma/SuPragma/wiki
Which database is more secure? MySQL or PostgreSQL? Are there others I should be considering? Is it possible to change the encryption keys dynamically?
Thanks,
Raj
PostgreSQL provides more tools and builtin features around security, eg: row level security and the support of SELinux (through SE-PostgreSQL). Overall, whatever you choose, the important is to keep it updated and have the skills to apply security best practices and update them regurarly, without this, it's like putting your money in Fort knox but leaving the vault key in a public place.
It is open-source and more tools than mySQL. PostgreSQL is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards compliance. It is also good for small companies due to tools for free availability. PostgreSQL includes built-in support for regular B-tree and hash indexes. Indexes in PostgreSQL also support Expression & Partial Indices ( index only a part of a table). Expression Index can be created with an index of the result of an expression or function, instead of simply the value of a column.
We are going to develop a microservices-based application. It consists of AngularJS, ASP.NET Core, and MSSQL.
We have 3 types of microservices. Emailservice, Filemanagementservice, Filevalidationservice
I am a beginner in microservices. But I have read about RabbitMQ, but come to know that there are Redis and Kafka also in the market. So, I want to know which is best.
Kafka is an Enterprise Messaging Framework whereas Redis is an Enterprise Cache Broker, in-memory database and high performance database.Both are having their own advantages, but they are different in usage and implementation. Now if you are creating microservices check the user consumption volumes, its generating logs, scalability, systems to be integrated and so on. I feel for your scenario initially you can go with KAFKA bu as the throughput, consumption and other factors are scaling then gradually you can add Redis accordingly.
I first recommend that you choose Angular over AngularJS if you are starting something new. AngularJs is no longer getting enhancements, but perhaps you meant Angular. Regarding microservices, I recommend considering microservices when you have different development teams for each service that may want to use different programming languages and backend data stores. If it is all the same team, same code language, and same data store I would not use microservices. I might use a message queue, in which case RabbitMQ is a good one. But you may also be able to simply write your own in which you write a record in a table in MSSQL and one of your services reads the record from the table and processes it. The most challenging part of doing it yourself is writing a service that does a good job of reading the queue without reading the same message multiple times or missing a message; and that is where RabbitMQ can help.
We found that the CNCF landscape is a good advisor when working going into the cloud / microservices space: https://landscape.cncf.io/fullscreen=yes. When choosing a technology one important criteria to me is if it is cloud native or not. Neither Redis, RabbitMQ nor Kafka is cloud native. The try to adapt but will be replaced eventually with technologies that are cloud native.
We have gone with NATS and have never looked back. We haven't spend a single minute on server maintainance in the last year and the setup of a cluster is way too easy. With the new features NATS incorporates now (and the ones still on the roadmap) it is already and will be sooo much mure than Redis, RabbitMQ and Kafka are. It can replace service discovery, load balancing, global multiclusters and failover, etc, etc.
Your thought might be: But I don't need all of that! Well, at the same time it is much more leightweight than Redis, RabbitMQ and especially Kafka.
I think something is missing here and you should consider answering it to yourself. You are building a couple of services. Why are you considering event-sourcing architecture using Message Brokers such as the above? Won't a simple REST service based arch suffice? Read about CQRS and the problems it entails (state vs command impedance for example). Do you need Pub/Sub or Push/Pull? Is queuing of messages enough or would you need querying or filtering of messages before consumption? Also, someone would have to manage these brokers (unless using managed, cloud provider based solution), automate their deployment, someone would need to take care of backups, clustering if needed, disaster recovery, etc. I have a good past experience in terms of manageability/devops of the above options with Kafka and Redis, not so much with RabbitMQ. Both are very performant. But also note that Redis is not a pure message broker (at time of writing) but more of a general purpose in-memory key-value store. Kafka nowadays is much more than a distributed message broker. Long story short. In my taste, you should go with a minialistic approach and try to avoid either of them if you can, especially if your architecture does not fall nicely into event sourcing. If not I'd examine Kafka. If you need more capabilities than I'd consider Redis and use it for all sorts of other things such as a cache.
I am a Microsoft SQL Server programmer who is a bit out of practice. I have been asked to assist on a new project. The overall purpose is to organize a large number of recordings so that they can be searched. I have an enormous music library but my songs are several hours long. I need to include things like time, date and location of the recording. I don't have a problem with the general database design. I have two primary questions:
- I need to use either MySQL or PostgreSQL on a Linux based OS. Which would be better for this application?
- I have not dealt with a sound based data type before. How do I store that and put it in a table? Thank you.
Hi Erin,
Honestly both databases will do the job just fine. I personally prefer Postgres.
Much more important is how you store the audio. While you could technically use a blob type column, it's really not ideal to be storing audio files which are "several hours long" in a database row. Instead consider storing the audio files in an object store (hosted options include backblaze b2 or aws s3) and persisting the key (which references that object) in your database column.
Hi Erin, Chances are you would want to store the files in a blob type. Both MySQL and Postgres support this. Can you explain a little more about your need to store the files in the database? I may be more effective to store the files on a file system or something like S3. To answer your qustion based on what you are descibing I would slighly lean towards PostgreSQL since it tends to be a little better on the data warehousing side.
Hi Erin! First of all, you'd probably want to go with a managed service. Don't spin up your own MySQL installation on your own Linux box. If you are on AWS, thet have different offerings for database services. Standard RDS vs. Aurora. Aurora would be my preferred choice given the benefits it offers, storage optimizations it comes with... etc. Such managed services easily allow you to apply new security patches and upgrades, set up backups, replication... etc. Doing this on your own would either be risky, inefficient, or you might just give up. As far as which database to chose, you'll have the choice between Postgresql, MySQL, Maria DB, SQL Server... etc. I personally would recommend MySQL (latest version available), as the official tooling for it (MySQL Workbench) is great, stable, and moreover free. Other database services exist, I'd recommend you also explore Dynamo DB.
Regardless, you'd certainly only keep high-level records, meta data in Database, and the actual files, most-likely in S3, so that you can keep all options open in terms of what you'll do with them.
Hey Erin! I would recommend checking out Directus before you start work on building your own app for them. I just stumbled upon it, and so far extremely happy with the functionalities. If your client is just looking for a simple web app for their own data, then Directus may be a great option. It offers "database mirroring", so that you can connect it to any database and set up functionality around it!
Hi Erin,
- Coming from "Big" DB engines, such as Oracle or MSSQL, go for PostgreSQL. You'll get all the features you need with PostgreSQL.
- Your case seems to point to a "NoSQL" or Document Database use case. Since you get covered on this with PostgreSQL which achieves excellent performances on JSON based objects, this is a second reason to choose PostgreSQL. MongoDB might be an excellent option as well if you need "sharding" and excellent map-reduce mechanisms for very massive data sets. You really should investigate the NoSQL option for your use case.
- Starting with AWS Aurora is an excellent advise. since "vendor lock-in" is limited, but I did not check for JSON based object / NoSQL features.
- If you stick to Linux server, the PostgreSQL or MySQL provided with your distribution are straightforward to install (i.e. apt install postgresql). For PostgreSQL, make sure you're comfortable with the pg_hba.conf, especially for IP restrictions & accesses.
Regards,
I recommend Postgres as well. Superior performance overall and a more robust architecture.
I am looking into IoT World Solution where we have MQTT Broker. This MQTT Broker Sits in one of the Data Center. We are doing a lot of Alert and Alarm related processing on that Data, Currently, we are looking into Solution which can do distributed persistence of log/alert primarily on remote Disk.
Our primary need is to use lightweight where operational complexity and maintenance costs can be significantly reduced. We want to do it on-premise so we are not considering cloud solutions.
We looked into the following alternatives:
Apache Kafka - Great choice but operation and maintenance wise very complex. Rabbit MQ - High availability is the issue, Apache Pulsar - Operational Complexity. NATS - Absence of persistence. Akka Streams - Big learning curve and operational streams.
So we are looking into a lightweight library that can do distributed persistence preferably with publisher and subscriber model. Preferable on JVM stack.
Kafka is best fit here. Below are the advantages with Kafka ACLs (Security), Schema (protobuf), Scale, Consumer driven and No single point of failure.
Operational complexity is manageable with open source monitoring tools.
Our backend application is sending some external messages to a third party application at the end of each backend (CRUD) API call (from UI) and these external messages take too much extra time (message building, processing, then sent to the third party and log success/failure), UI application has no concern to these extra third party messages.
So currently we are sending these third party messages by creating a new child thread at end of each REST API call so UI application doesn't wait for these extra third party API calls.
I want to integrate Apache Kafka for these extra third party API calls, so I can also retry on failover third party API calls in a queue(currently third party messages are sending from multiple threads at the same time which uses too much processing and resources) and logging, etc.
Question 1: Is this a use case of a message broker?
Question 2: If it is then Kafka vs RabitMQ which is the better?
RabbitMQ is great for queuing and retrying. You can send the requests to your backend which will further queue these requests in RabbitMQ (or Kafka, too). The consumer on the other end can take care of processing . For a detailed analysis, check this blog about choosing between Kafka and RabbitMQ.
Well, first off, it's good practice to do as little non-UI work on the foreground thread as possible, regardless of whether the requests take a long time. You don't want the UI thread blocked.
This sounds like a good use case for RabbitMQ. Primarily because you don't need each message processed by more than one consumer. If you wanted to process a single message more than once (say for different purposes), then Apache Kafka would be a much better fit as you can have multiple consumer groups consuming from the same topics independently.
Have your API publish messages containing the data necessary for the third-party request to a Rabbit queue and have consumers reading off there. If it fails, you can either retry immediately, or publish to a deadletter queue where you can reprocess them whenever you want (shovel them back into the regular queue).
In my opinion RabbitMQ fits better in your case because you don’t have order in queue. You can process your messages in any order. You don’t need to store the data what you sent. Kafka is a persistent storage like the blockchain. RabbitMQ is a message broker. Kafka is not a good solution for the system with confirmations of the messages delivery.
As far as I understand, Kafka is a like a persisted event state manager where you can plugin various source of data and transform/query them as event via a stream API. Regarding your use case I will consider using RabbitMQ if your intent is to implement service inter-communication kind of thing. RabbitMQ is a good choice for one-one publisher/subscriber (or consumer) and I think you can also have multiple consumers by configuring a fanout exchange. RabbitMQ provide also message retries, message cancellation, durable queue, message requeue, message ACK....
At Pushnami we were looking at several alternative databases that would support following architectural requirements: - very quick prototyping for an unknown domain - ability to support large amounts of data - native ability to replicate and fail over - full stack approach for Node.js development After careful consideration MongoDB came on top, and 3 years later we are still very happy with that decision. Currently we keep almost 2TB of data in our cluster, and start thinking about sharding.
MySQL has a lot of strengths working for it. It's simple and easy to set up and use. It's JSON engine is also really good these days. Mongo is also simple to setup and use, and it's speed as a document-object storage engine is first class.
Where Postgres has both beat is in it's combining of all of the features that make both MySQL and Mongo great, while adding on enterprise grade level scalability and replication. It's Postgres' stability and robustness, while still fulfilling the roles of it's contemporaries extremely well that edge Postgre for me.
When I was new with web development, I was using PHP for backend and MySQL for database. But after improving my JS skills, I chosen Node.js. Because of too many reasons including npm, express, community, fast coding and etc. MongoDB is so good for using with Node.js. If your JS skills are enough good, I recommend to migrate to Node.js and MongoDB.
My data was inherently hierarchical, but there was not enough content in each level of the hierarchy to justify a relational DB (SQL) with a one-to-many approach. It was also far easier to share data between the frontend (Angular), backend (Node.js) and DB (MongoDB) as they all pass around JSON natively. This allowed me to skip the translation layer from relational to hierarchical. You do need to think about correct indexes in MongoDB, and make sure the objects have finite size. For instance, an object in your DB shouldn't have a property which is an array that grows over time, without limit. In addition, I did use MySQL for other types of data, such as a catalog of products which (a) has a lot of data, (b) flat and not hierarchical, (c) needed very fast queries.
PostgreSQL is enterprise level database with transactions, full-text indexes, vector indexes, JSON, BLOB, geo-spatial data and a lot more. Highly scalable, configurable and easily maintainable. all that on an open source RDBMS database and you are still looking for GPL licensed MySQL with limited features? Look again.
We wanted a JSON datastore that could save the state of our bioinformatics visualizations without destructive normalization. As a leading NoSQL data storage technology, MongoDB has been a perfect fit for our needs. Plus it's open source, and has an enterprise SLA scale-out path, with support of hosted solutions like Atlas. Mongo has been an absolute champ. So much so that SQL and Oracle have begun shipping JSON column types as a new feature for their databases. And when Fast Healthcare Interoperability Resources (FHIR) announced support for JSON, we basically had our FHIR datalake technology.
In the field of bioinformatics, we regularly work with hierarchical and unstructured document data. Unstructured text data from PDFs, image data from radiographs, phylogenetic trees and cladograms, network graphs, streaming ECG data... none of it fits into a traditional SQL database particularly well. As such, we prefer to use document oriented databases.
MongoDB is probably the oldest component in our stack besides Javascript, having been in it for over 5 years. At the time, we were looking for a technology that could simply cache our data visualization state (stored in JSON) in a database as-is without any destructive normalization. MongoDB was the perfect tool; and has been exceeding expectations ever since.
Trivia fact: some of the earliest electronic medical records (EMRs) used a document oriented database called MUMPS as early as the 1960s, prior to the invention of SQL. MUMPS is still in use today in systems like Epic and VistA, and stores upwards of 40% of all medical records at hospitals. So, we saw MongoDB as something as a 21st century version of the MUMPS database.
While there's been some very clever techniques that has allowed non-natively supported geo querying to be performed, it is incredibly slow in the long game and error prone at best.
MySQL finally introduced it's own GEO functions and special indexing operations for GIS type data. I prototyped with this, as MySQL is the most familiar database to me. But no matter what I did with it, how much tuning i'd give it, how much I played with it, the results would come back inconsistent.
It was very disappointing.
I figured, at this point, that SQL Server, being an enterprise solution authored by one of the biggest worldwide software developers in the world, Microsoft, might contain some decent GIS in it.
I was very disappointed.
Postgres is a Database solution i'm still getting familiar with, but I noticed it had no built in support for GIS. So I hilariously didn't pay it too much attention. That was until I stumbled upon PostGIS and my world changed forever.
I happen to point my asp.net core web application from MSSQL to MySQL due to infrastructure costs associated with the former db. The application also had challenges creating a migration schema of asp.net membership on MySQL.
After a thorough research I figured out how to do it and also made a video and uploaded to youtube. You can check that here https://youtu.be/X4I0DUw6C84
The full source code for the demo template is available on github here http://bit.ly/2LWgacA
Pros of Kafka
- High-throughput126
- Distributed119
- Scalable92
- High-Performance86
- Durable66
- Publish-Subscribe38
- Simple-to-use19
- Open source18
- Written in Scala and java. Runs on JVM12
- Message broker + Streaming system9
- KSQL4
- Avro schema integration4
- Robust4
- Suport Multiple clients3
- Extremely good parallelism constructs2
- Partioned, replayable log2
- Simple publisher / multi-subscriber model1
- Fun1
- Flexible1
Pros of MySQL
- Sql800
- Free679
- Easy562
- Widely used528
- Open source490
- High availability180
- Cross-platform support160
- Great community104
- Secure79
- Full-text indexing and searching75
- Fast, open, available26
- Reliable16
- SSL support16
- Robust15
- Enterprise Version9
- Easy to set up on all platforms7
- NoSQL access to JSON data type3
- Relational database1
- Easy, light, scalable1
- Sequel Pro (best SQL GUI)1
- Replica Support1
Sign up to add or upvote prosMake informed product decisions
Cons of Kafka
- Non-Java clients are second-class citizens32
- Needs Zookeeper29
- Operational difficulties9
- Terrible Packaging5
Cons of MySQL
- Owned by a company with their own agenda16
- Can't roll back schema changes3