Kafka vs PostgreSQL

Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Kafka
Kafka

3.6K
3K
+ 1
460
PostgreSQL
PostgreSQL

16.9K
12.7K
+ 1
3.4K
Add tool

Kafka vs PostgreSQL: What are the differences?

Kafka: Distributed, fault tolerant, high throughput pub-sub messaging system. Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design; PostgreSQL: A powerful, open source object-relational database system. PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.

Kafka belongs to "Message Queue" category of the tech stack, while PostgreSQL can be primarily classified under "Databases".

"High-throughput", "Distributed" and "Scalable" are the key factors why developers consider Kafka; whereas "Relational database", "High availability " and "Enterprise class database" are the primary reasons why PostgreSQL is favored.

Kafka and PostgreSQL are both open source tools. Kafka with 12.7K GitHub stars and 6.81K forks on GitHub appears to be more popular than PostgreSQL with 5.44K GitHub stars and 1.8K GitHub forks.

Uber Technologies, Spotify, and Netflix are some of the popular companies that use PostgreSQL, whereas Kafka is used by Uber Technologies, Spotify, and Slack. PostgreSQL has a broader approval, being mentioned in 2739 company stacks & 2170 developers stacks; compared to Kafka, which is listed in 509 company stacks and 470 developer stacks.

What is Kafka?

Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

What is PostgreSQL?

PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions.
Get Advice Icon

Need advice about which tool to choose?Ask the StackShare community!

Why do developers choose Kafka?
Why do developers choose PostgreSQL?

Sign up to add, upvote and see more prosMake informed product decisions

What companies use Kafka?
What companies use PostgreSQL?

Sign up to get full access to all the companiesMake informed product decisions

What tools integrate with Kafka?
What tools integrate with PostgreSQL?

Sign up to get full access to all the tool integrationsMake informed product decisions

What are some alternatives to Kafka and PostgreSQL?
ActiveMQ
Apache ActiveMQ is fast, supports many Cross Language Clients and Protocols, comes with easy to use Enterprise Integration Patterns and many advanced features while fully supporting JMS 1.1 and J2EE 1.4. Apache ActiveMQ is released under the Apache 2.0 License.
RabbitMQ
RabbitMQ gives your applications a common platform to send and receive messages, and your messages a safe place to live until received.
Amazon Kinesis
Amazon Kinesis can collect and process hundreds of gigabytes of data per second from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time, from sources such as web site click-streams, marketing and financial information, manufacturing instrumentation and social media, and operational logs and metering data.
Apache Spark
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
Akka
Akka is a toolkit and runtime for building highly concurrent, distributed, and resilient message-driven applications on the JVM.
See all alternatives
Decisions about Kafka and PostgreSQL
Dan Robinson
Dan Robinson
at Heap, Inc. · | 14 upvotes · 44.6K views
atHeapHeap
Heap
Heap
Node.js
Node.js
Kafka
Kafka
PostgreSQL
PostgreSQL
Citus
Citus
#FrameworksFullStack
#Databases
#MessageQueue

At Heap, we searched for an existing tool that would allow us to express the full range of analyses we needed, index the event definitions that made up the analyses, and was a mature, natively distributed system.

After coming up empty on this search, we decided to compromise on the “maturity” requirement and build our own distributed system around Citus and sharded PostgreSQL. It was at this point that we also introduced Kafka as a queueing layer between the Node.js application servers and Postgres.

If we could go back in time, we probably would have started using Kafka on day one. One of the biggest benefits in adopting Kafka has been the peace of mind that it brings. In an analytics infrastructure, it’s often possible to make data ingestion idempotent.

In Heap’s case, that means that, if anything downstream from Kafka goes down, we won’t lose any data – it’s just going to take a bit longer to get to its destination. We also learned that you want the path between data hitting your servers and your initial persistence layer (in this case, Kafka) to be as short and simple as possible, since that is the surface area where a failure means you can lose customer data. We learned that it’s a very good fit for an analytics tool, since you can handle a huge number of incoming writes with relatively low latency. Kafka also gives you the ability to “replay” the data flow: it’s like a commit log for your whole infrastructure.

#MessageQueue #Databases #FrameworksFullStack

See more
Anton Sidelnikov
Anton Sidelnikov
Backend Developer at Beamery · | 5 upvotes · 9K views
MongoDB
MongoDB
PostgreSQL
PostgreSQL

In my opinion PostgreSQL is totally over MongoDB - not only works with structured data & SQL & strict types, but also has excellent support for unstructured data as separate data type (you can store arbitrary JSONs - and they may be also queryable, depending on one of format's you may choose). Both writes & reads are much faster, then in Mongo. So you can get best on Document NoSQL & SQL in single database..

Formal downside of PostgreSQL is clustering scalability. There's not simple way to build distributed a cluster. However, two points:

1) You will need much more time before you need to actually scale due to PG's efficiency. And if you follow database-per-service pattern, maybe you won't need ever, cause dealing few billion records on single machine is an option for PG.

2) When you need to - you do it in a way you need, including as a part of app's logic (e.g. sharding by key, or PG-based clustering solution with strict model), scalability will be very transparent, much more obvious than Mongo's "cluster just works (but then fails)" replication.

See more
Roman Bulgakov
Roman Bulgakov
Senior Back-End Developer, Software Architect at Chemondis GmbH · | 3 upvotes · 10.5K views
Kafka
Kafka

I use Kafka because it has almost infinite scaleability in terms of processing events (could be scaled to process hundreds of thousands of events), great monitoring (all sorts of metrics are exposed via JMX).

Downsides of using Kafka are: - you have to deal with Zookeeper - you have to implement advanced routing yourself (compared to RabbitMQ it has no advanced routing)

See more
John Kodumal
John Kodumal
CTO at LaunchDarkly · | 15 upvotes · 135.7K views
atLaunchDarklyLaunchDarkly
Kafka
Kafka
Amazon Kinesis
Amazon Kinesis
Redis
Redis
Amazon EC2
Amazon EC2
Amazon ElastiCache
Amazon ElastiCache
Consul
Consul
Patroni
Patroni
TimescaleDB
TimescaleDB
PostgreSQL
PostgreSQL
Amazon RDS
Amazon RDS

As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.

We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.

See more
Tim Nolet
Tim Nolet
Founder, Engineer & Dishwasher at Checkly · | 8 upvotes · 60.9K views
atChecklyHQChecklyHQ
Amazon DynamoDB
Amazon DynamoDB
MongoDB
MongoDB
Node.js
Node.js
Heroku
Heroku
PostgreSQL
PostgreSQL

PostgreSQL Heroku Node.js MongoDB Amazon DynamoDB

When I started building Checkly, one of the first things on the agenda was how to actually structure our SaaS database model: think accounts, users, subscriptions etc. Weirdly, there is not a lot of information on this on the "blogopshere" (cringe...). After research and some false starts with MongoDB and Amazon DynamoDB we ended up with PostgreSQL and a schema consisting of just four tables that form the backbone of all generic "Saasy" stuff almost any B2B SaaS bumps into.

In a nutshell:cPostgreSQL Heroku Node.js MongoDB Amazon DynamoDB

When I started building Checkly, one of the first things on the agenda was how to actually structure our SaaS database model: think accounts, users, subscriptions etc. Weirdly, there is not a lot of information on this on the "blogopshere" (cringe...). After research and some false starts with MongoDB and Amazon DynamoDB we ended up with PostgreSQL and a schema consisting of just four tables that form the backbone of all generic "Saasy" stuff almost any B2B SaaS bumps into.

In a nutshell:

  • We use Postgres on Heroku.
  • We use a "one database, on schema" approach for partitioning customer data.
  • We use an accounts, memberships and users table to create a many-to-many relation between users and accounts.
  • We completely decouple prices, payments and the exact ingredients for a customer's plan.

All the details including a database schema diagram are in the linked blog post.

See more
Łukasz Korecki
Łukasz Korecki
CTO & Co-founder at EnjoyHQ · | 12 upvotes · 37.7K views
atEnjoyHQEnjoyHQ
PostgreSQL
PostgreSQL
MongoDB
MongoDB
RethinkDB
RethinkDB

We initially chose RethinkDB because of the schema-less document store features, and better durability resilience/story than MongoDB In the end, it didn't work out quite as we expected: there's plenty of scalability issues, it's near impossible to run analytical workloads and small community makes working with Rethink a challenge. We're in process of migrating all our workloads to PostgreSQL and hopefully, we will be able to decommission our RethinkDB deployment soon.

See more
Eric Colson
Eric Colson
Chief Algorithms Officer at Stitch Fix · | 19 upvotes · 265.4K views
atStitch FixStitch Fix
Amazon EC2 Container Service
Amazon EC2 Container Service
Docker
Docker
PyTorch
PyTorch
R
R
Python
Python
Presto
Presto
Apache Spark
Apache Spark
Amazon S3
Amazon S3
PostgreSQL
PostgreSQL
Kafka
Kafka
#Data
#DataStack
#DataScience
#ML
#Etl
#AWS

The algorithms and data infrastructure at Stitch Fix is housed in #AWS. Data acquisition is split between events flowing through Kafka, and periodic snapshots of PostgreSQL DBs. We store data in an Amazon S3 based data warehouse. Apache Spark on Yarn is our tool of choice for data movement and #ETL. Because our storage layer (s3) is decoupled from our processing layer, we are able to scale our compute environment very elastically. We have several semi-permanent, autoscaling Yarn clusters running to serve our data processing needs. While the bulk of our compute infrastructure is dedicated to algorithmic processing, we also implemented Presto for adhoc queries and dashboards.

Beyond data movement and ETL, most #ML centric jobs (e.g. model training and execution) run in a similarly elastic environment as containers running Python and R code on Amazon EC2 Container Service clusters. The execution of batch jobs on top of ECS is managed by Flotilla, a service we built in house and open sourced (see https://github.com/stitchfix/flotilla-os).

At Stitch Fix, algorithmic integrations are pervasive across the business. We have dozens of data products actively integrated systems. That requires serving layer that is robust, agile, flexible, and allows for self-service. Models produced on Flotilla are packaged for deployment in production using Khan, another framework we've developed internally. Khan provides our data scientists the ability to quickly productionize those models they've developed with open source frameworks in Python 3 (e.g. PyTorch, sklearn), by automatically packaging them as Docker containers and deploying to Amazon ECS. This provides our data scientist a one-click method of getting from their algorithms to production. We then integrate those deployments into a service mesh, which allows us to A/B test various implementations in our product.

For more info:

#DataScience #DataStack #Data

See more
Mauro Bennici
Mauro Bennici
CTO at You Are My GUide · | 7 upvotes · 10.2K views
atYou Are My GUideYou Are My GUide
MongoDB
MongoDB
TimescaleDB
TimescaleDB
PostgreSQL
PostgreSQL

PostgreSQL plus TimescaleDB allow us to concentrate the business effort on how to analyze valuable data instead of manage them on IT side. We are now able to ingest thousand of social shares "managed" data without compromise the scalability of the system or the time query. TimescaleDB is transparent to PostgreSQL , so we continue to use the same SQL syntax without any changes. At the same time, because we need to manage few document objects we dismissed the MongoDB cluster.

See more
Tor Hagemann
Tor Hagemann
at Socotra · | 2 upvotes · 2.2K views
atSocotraSocotra
Amazon DynamoDB
Amazon DynamoDB
PostgreSQL
PostgreSQL
MySQL
MySQL

Much of our data model is relational, which makes MySQL or PostgreSQL (and family) fit the API's we need to build, in order to meet the needs of our customers.

Sometimes the flexibility of a NoSQL store like Amazon DynamoDB is very useful, but the lack of consistency really impacts usability and performance long-term, compared with viable alternatives. At our current scale, we've seen huge benefits from moving some of our tables out of Dynamo and doing more in SQL.

There will always be use cases for NoSQL and key-values stores, but if your model is well understood in your business/industry: relational databases are the way to go after finding product-market fit. Always understand the trade-offs (and a few intimate details) of any data store before you add to your company's stack!

See more
Joseph Irving
Joseph Irving
DevOps Engineer at uSwitch · | 8 upvotes · 7.3K views
atuSwitchuSwitch
Go
Go
PostgreSQL
PostgreSQL
MySQL
MySQL
Kubernetes
Kubernetes
Vault
Vault

At uSwitch we use Vault to generate short lived database credentials for our applications running in Kubernetes. We wanted to move from an environment where we had 100 dbs with a variety of static passwords being shared around to a place where each pod would have credentials that only last for its lifetime.

We chose vault because:

  • It had built in Kubernetes support so we could use service accounts to permission which pods could access which database.

  • A terraform provider so that we could configure both our RDS instances and their vault configuration in one place.

  • A variety of database providers including MySQL/PostgreSQL (our most common dbs).

  • A good api/Go -sdk so that we could build tooling around it to simplify development worfklow.

  • It had other features we would utilise such as PKI

See more
Daniel Quinn
Daniel Quinn
Senior Developer at Founders4Schools · | 2 upvotes · 21K views
atThe Paperless ProjectThe Paperless Project
PostgreSQL
PostgreSQL
SQLite
SQLite

SQLite is a tricky beast. It's great if you're working single-threaded, but a Terrible Idea if you've got more than one concurrent connection. You use it because it's easy to setup, light, and portable (it's just a file).

In Paperless, we've built a self-hosted web application, so it makes sense to standardise on something small & light, and as we don't have to worry about multiple connections (it's just you using the app), it's a perfect fit.

For users wanting to scale Paperless up to a multi-user environment though, we do provide the hooks to switch to PostgreSQL .

See more
Robert Zuber
Robert Zuber
CTO at CircleCI · | 22 upvotes · 145.4K views
atCircleCICircleCI
Amazon S3
Amazon S3
GitHub
GitHub
Redis
Redis
PostgreSQL
PostgreSQL
MongoDB
MongoDB

We use MongoDB as our primary #datastore. Mongo's approach to replica sets enables some fantastic patterns for operations like maintenance, backups, and #ETL.

As we pull #microservices from our #monolith, we are taking the opportunity to build them with their own datastores using PostgreSQL. We also use Redis to cache data we’d never store permanently, and to rate-limit our requests to partners’ APIs (like GitHub).

When we’re dealing with large blobs of immutable data (logs, artifacts, and test results), we store them in Amazon S3. We handle any side-effects of S3’s eventual consistency model within our own code. This ensures that we deal with user requests correctly while writes are in process.

See more
Martin Johannesson
Martin Johannesson
Senior Software Developer at IT Minds · | 10 upvotes · 14.8K views
atIT MindsIT Minds
AMP
AMP
PWA
PWA
React
React
MongoDB
MongoDB
Next.js
Next.js
GraphQL
GraphQL
Apollo
Apollo
PostgreSQL
PostgreSQL
TypeORM
TypeORM
Node.js
Node.js
TypeScript
TypeScript
#Serverless
#Backend
#B2B

At IT Minds we create customized internal or #B2B web and mobile apps. I have a go to stack that I pitch to our customers consisting of 3 core areas. 1) A data core #backend . 2) A micro #serverless #backend. 3) A user client #frontend.

For the Data Core I create a backend using TypeScript Node.js and with TypeORM connecting to a PostgreSQL Exposing an action based api with Apollo GraphQL

For the micro serverless backend, which purpose is verification for authentication, autorization, logins and the likes. It is created with Next.js api pages. Using MongoDB to store essential information, caching etc.

Finally the frontend is built with React using Next.js , TypeScript and @Apollo. We create the frontend as a PWA and have a AMP landing page by default.

See more
Jelena Dedovic
Jelena Dedovic
MSSQL
MSSQL
PostgreSQL
PostgreSQL
AIOHTTP
AIOHTTP
asyncio
asyncio
Tornado
Tornado

Investigating Tortoise ORM and GINO ORM...

I need to introduce some async ORM with the current stack: Tornado with asyncio loop, AIOHTTP, with PostgreSQL and MSSQL. This project revolves heavily around realtime and due to the realtime requirements, blocking during database access is not acceptable.

Considering that these ORMs are both young projects, I wondered if anybody had some experience with similar stack and these async ORMs?

See more
Nicolas Apx
Nicolas Apx
CEO - FullStack Javascript at Apx Development Limited · | 14 upvotes · 16.7K views
atAPX DevelopmentAPX Development
PostgreSQL
PostgreSQL
MongoDB
MongoDB
Node.js
Node.js
Python
Python

I am planning on building a micro-service eCommerce back-end to be easy to reuse in any project as we need. I would like to use both Python and Node.js and MongoDB & PostgreSQL , in your opinion which one would best suited for the following services:

  • Users-service
  • Products-service
  • Auth-service
  • Inventory-service
  • Order-service
  • Payment-service
  • Sku-service
  • And more not yet defined....

Thanks

Nicolas

See more
Interest over time
Reviews of Kafka and PostgreSQL
No reviews found
How developers use Kafka and PostgreSQL
Avatar of AngeloR
AngeloR uses PostgreSQLPostgreSQL

We use postgresql for the merge between sql/nosql. A lot of our data is unstructured JSON, or JSON that is currently in flux due to some MVP/interation processes that are going on. PostgreSQL gives the capability to do this.

At the moment PostgreSQL on amazon is only at 9.5 which is one minor version down from support for document fragment updates which is something that we are waiting for. However, that may be some ways away.

Other than that, we are using PostgreSQL as our main SQL store as a replacement for all the MSSQL databases that we have. Not only does it have great support through RDS (small ops team), but it also has some great ways for us to migrate off RDS to managed EC2 instances down the line if we need to.

Avatar of Cloudcraft
Cloudcraft uses PostgreSQLPostgreSQL

PostgreSQL combines the best aspects of traditional SQL databases such as reliability, consistent performance, transactions, querying power, etc. with the flexibility of schemaless noSQL systems that are all the rage these days. Through the powerful JSON column types and indexes, you can now have your cake and eat it too! PostgreSQL may seem a bit arcane and old fashioned at first, but the developers have clearly shown that they understand databases and the storage trends better than almost anyone else. It definitely deserves to be part of everyone's toolbox; when you find yourself needing rock solid performance, operational simplicity and reliability, reach for PostgresQL.

Avatar of Pinterest
Pinterest uses KafkaKafka

http://media.tumblr.com/d319bd2624d20c8a81f77127d3c878d0/tumblr_inline_nanyv6GCKl1s1gqll.png

Front-end messages are logged to Kafka by our API and application servers. We have batch processing (on the middle-left) and real-time processing (on the middle-right) pipelines to process the experiment data. For batch processing, after daily raw log get to s3, we start our nightly experiment workflow to figure out experiment users groups and experiment metrics. We use our in-house workflow management system Pinball to manage the dependencies of all these MapReduce jobs.

Avatar of Brandon Adams
Brandon Adams uses PostgreSQLPostgreSQL

Relational data stores solve a lot of problems reasonably well. Postgres has some data types that are really handy such as spatial, json, and a plethora of useful dates and integers. It has good availability of indexing solutions, and is well-supported for both custom modifications as well as hosting options (I like Amazon's Postgres for RDS). I use HoneySQL for Clojure as a composable AST that translates reliably to SQL. I typically use JDBC on Clojure, usually via org.clojure/java.jdbc.

Avatar of ReviewTrackers
ReviewTrackers uses PostgreSQLPostgreSQL

PostgreSQL is responsible for nearly all data storage, validation and integrity. We leverage constraints, functions and custom extensions to ensure we have only one source of truth for our data access rules and that those rules live as close to the data as possible. Call us crazy, but ORMs only lead to ruin and despair.

Avatar of Jeff Flynn
Jeff Flynn uses PostgreSQLPostgreSQL

Tried MongoDB - early euphoria - later dread. Tried MySQL - not bad at all. Found PostgreSQL - will never go back. So much support for this it should be your first choice. Simple local (free) installation, and one-click setup in Heroku - lots of options in terms of pricing/performance combinations.

Avatar of Coolfront Technologies
Coolfront Technologies uses KafkaKafka

Building out real-time streaming server to present data insights to Coolfront Mobile customers and internal sales and marketing teams.

Avatar of ShareThis
ShareThis uses KafkaKafka

We are using Kafka as a message queue to process our widget logs.

Avatar of Christopher Davison
Christopher Davison uses KafkaKafka

Used for communications and triggering jobs across ETL systems

Avatar of theskyinflames
theskyinflames uses KafkaKafka

Used as a integration middleware by messaging interchanging.

How much does Kafka cost?
How much does PostgreSQL cost?
Pricing unavailable
Pricing unavailable