Feed powered byStream Blue Logo Copy 5
PostgreSQL

PostgreSQL

Application and Data / Data Stores / Databases

Decision at FundsCorner about Zappa, AWS Lambda, SQLAlchemy, Python, Amazon SQS, Node.js, MongoDB Stitch, PostgreSQL, MongoDB

Avatar of jeyabalajis
ZappaZappa
AWS LambdaAWS Lambda
SQLAlchemySQLAlchemy
PythonPython
Amazon SQSAmazon SQS
Node.jsNode.js
MongoDB StitchMongoDB Stitch
PostgreSQLPostgreSQL
MongoDBMongoDB

Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

Based on the above criteria, we selected the following tools to perform the end to end data replication:

We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

23 upvotes·18.7K views

Decision at Sentry about Redis, PostgreSQL, Celery, Django, InMemoryDatabases, MessageQueue

Avatar of jtcunning
Operations Engineer at Sentry ·
RedisRedis
PostgreSQLPostgreSQL
CeleryCelery
DjangoDjango
#InMemoryDatabases
#MessageQueue

Sentry started as (and remains) an open-source project, growing out of an error logging tool built in 2008. That original build nine years ago was Django and Celery (Python’s asynchronous task codebase), with PostgreSQL as the database and Redis as the power behind Celery.

We displayed a truly shrewd notion of branding even then, giving the project a catchy name that companies the world over remain jealous of to this day: django-db-log. For the longest time, Sentry’s subtitle on GitHub was “A simple Django app, built with love.” A slightly more accurate description probably would have included Starcraft and Soylent alongside love; regardless, this captured what Sentry was all about.

#MessageQueue #InMemoryDatabases

21 upvotes·29.2K views

Decision at Heap about Citus, PostgreSQL, Databases, DataStores

Avatar of drob
CitusCitus
PostgreSQLPostgreSQL
#Databases
#DataStores

PostgreSQL was an easy early decision for the founding team. The relational data model fit the types of analyses they would be doing: filtering, grouping, joining, etc., and it was the database they knew best.

Shortly after adopting PG, they discovered Citus, which is a tool that makes it easy to distribute queries. Although it was a young project and a fork of Postgres at that point, Dan says the team was very available, highly expert, and it wouldn’t be very difficult to move back to PG if they needed to.

The stuff they forked was in query execution. You could treat the worker nodes like regular PG instances. Citus also gave them a ton of flexibility to make queries fast, and again, they felt the data model was the best fit for their application.

#DataStores #Databases

16 upvotes·6.3K views

Decision at Stitch about PostgreSQL, MySQL, Clojure

Avatar of jakestein
CEO at Stitch ·
PostgreSQLPostgreSQL
MySQLMySQL
ClojureClojure

The majority of our Clojure microservices are simple web services that wrap a transactional database with CRUD operations and a little bit of business logic. We use both MySQL and PostgreSQL for transactional data persistence, having transitioned from the former to the latter for newer services to take advantage of the new features coming out of the Postgres community.

Most of our Clojure best practices can be summed up by the phrase "keep it simple." We avoid more complex web frameworks in favor of using the Ring library to build web service routes, and we prefer sending SQL directly to the JDBC library rather than using a complicated ORM or SQL DSL.

15 upvotes·4K views

Decision at Uploadcare about PostgreSQL, Amazon DynamoDB, Amazon S3, Redis, Python, Google App Engine

Avatar of dmitry-mukhin
PostgreSQLPostgreSQL
Amazon DynamoDBAmazon DynamoDB
Amazon S3Amazon S3
RedisRedis
PythonPython
Google App EngineGoogle App Engine

Uploadcare has built an infinitely scalable infrastructure by leveraging AWS. Building on top of AWS allows us to process 350M daily requests for file uploads, manipulations, and deliveries. When we started in 2011 the only cloud alternative to AWS was Google App Engine which was a no-go for a rather complex solution we wanted to build. We also didn’t want to buy any hardware or use co-locations.

Our stack handles receiving files, communicating with external file sources, managing file storage, managing user and file data, processing files, file caching and delivery, and managing user interface dashboards.

At its core, Uploadcare runs on Python. The Europython 2011 conference in Florence really inspired us, coupled with the fact that it was general enough to solve all of our challenges informed this decision. Additionally we had prior experience working in Python.

We chose to build the main application with Django because of its feature completeness and large footprint within the Python ecosystem.

All the communications within our ecosystem occur via several HTTP APIs, Redis, Amazon S3, and Amazon DynamoDB. We decided on this architecture so that our our system could be scalable in terms of storage and database throughput. This way we only need Django running on top of our database cluster. We use PostgreSQL as our database because it is considered an industry standard when it comes to clustering and scaling.

15 upvotes·2.9K views

Decision at Heap about Heap, Node.js, Kafka, PostgreSQL, Citus, FrameworksFullStack, Databases, MessageQueue

Avatar of drob
HeapHeap
Node.jsNode.js
KafkaKafka
PostgreSQLPostgreSQL
CitusCitus
#FrameworksFullStack
#Databases
#MessageQueue

At Heap, we searched for an existing tool that would allow us to express the full range of analyses we needed, index the event definitions that made up the analyses, and was a mature, natively distributed system.

After coming up empty on this search, we decided to compromise on the “maturity” requirement and build our own distributed system around Citus and sharded PostgreSQL. It was at this point that we also introduced Kafka as a queueing layer between the Node.js application servers and Postgres.

If we could go back in time, we probably would have started using Kafka on day one. One of the biggest benefits in adopting Kafka has been the peace of mind that it brings. In an analytics infrastructure, it’s often possible to make data ingestion idempotent.

In Heap’s case, that means that, if anything downstream from Kafka goes down, we won’t lose any data – it’s just going to take a bit longer to get to its destination. We also learned that you want the path between data hitting your servers and your initial persistence layer (in this case, Kafka) to be as short and simple as possible, since that is the surface area where a failure means you can lose customer data. We learned that it’s a very good fit for an analytics tool, since you can handle a huge number of incoming writes with relatively low latency. Kafka also gives you the ability to “replay” the data flow: it’s like a commit log for your whole infrastructure.

#MessageQueue #Databases #FrameworksFullStack

14 upvotes·5.8K views

Decision at ChecklyHQ about vuex, Knex.js, PostgreSQL, Amazon S3, AWS Lambda, Vue.js, hapi, Node.js, GitHub, Docker, Heroku

Avatar of tim_nolet
Founder, Engineer & Dishwasher at Checkly ·
vuexvuex
Knex.jsKnex.js
PostgreSQLPostgreSQL
Amazon S3Amazon S3
AWS LambdaAWS Lambda
Vue.jsVue.js
hapihapi
Node.jsNode.js
GitHubGitHub
DockerDocker
HerokuHeroku

Heroku Docker GitHub Node.js hapi Vue.js AWS Lambda Amazon S3 PostgreSQL Knex.js Checkly is a fairly young company and we're still working hard to find the correct mix of product features, price and audience.

We are focussed on tech B2B, but I always wanted to serve solo developers too. So I decided to make a $7 plan.

Why $7? Simply put, it seems to be a sweet spot for tech companies: Heroku, Docker, Github, Appoptics (Librato) all offer $7 plans. They must have done a ton of research into this, so why not piggy back that and try it out.

Enough biz talk, onto tech. The challenges were:

  • Slice of a portion of the functionality so a $7 plan is still profitable. We call this the "plan limits"
  • Update API and back end services to handle and enforce plan limits.
  • Update the UI to kindly state plan limits are in effect on some part of the UI.
  • Update the pricing page to reflect all changes.
  • Keep the actual processing backend, storage and API's as untouched as possible.

In essence, we went from strictly volume based pricing to value based pricing. Here come the technical steps & decisions we made to get there.

  1. We updated our PostgreSQL schema so plans now have an array of "features". These are string constants that represent feature toggles.
  2. The Vue.js frontend reads these from the vuex store on login.
  3. Based on these values, the UI has simple v-if statements to either just show the feature or show a friendly "please upgrade" button.
  4. The hapi API has a hook on each relevant API endpoint that checks whether a user's plan has the feature enabled, or not.

Side note: We offer 10 SMS messages per month on the developer plan. However, we were not actually counting how many people were sending. We had to update our alerting daemon (that runs on Heroku and triggers SMS messages via AWS SNS) to actually bump a counter.

What we build is basically feature-toggling based on plan features. It is very extensible for future additions. Our scheduling and storage backend that actually runs users' monitoring requests (AWS Lambda) and stores the results (S3 and Postgres) has no knowledge of all of this and remained unchanged.

Hope this helps anyone building out their SaaS and is in a similar situation.

14 upvotes·5.6K views

Decision at Dubsmash about Amazon RDS for Aurora, Redis, Amazon DynamoDB, Amazon RDS, Heroku, PostgreSQL, PlatformAsAService, Databases, NosqlDatabaseAsAService, SqlDatabaseAsAService

Avatar of tspecht
‎Co-Founder and CTO at Dubsmash ·
Amazon RDS for AuroraAmazon RDS for Aurora
RedisRedis
Amazon DynamoDBAmazon DynamoDB
Amazon RDSAmazon RDS
HerokuHeroku
PostgreSQLPostgreSQL
#PlatformAsAService
#Databases
#NosqlDatabaseAsAService
#SqlDatabaseAsAService

Over the years we have added a wide variety of different storages to our stack including PostgreSQL (some hosted by Heroku, some by Amazon RDS) for storing relational data, Amazon DynamoDB to store non-relational data like recommendations & user connections, or Redis to hold pre-aggregated data to speed up API endpoints.

Since we started running Postgres ourselves on RDS instead of only using the managed offerings of Heroku, we've gained additional flexibility in scaling our application while reducing costs at the same time.

We are also heavily testing Amazon RDS for Aurora in its Postgres-compatible version and will also give the new release of Aurora Serverless a try!

#SqlDatabaseAsAService #NosqlDatabaseAsAService #Databases #PlatformAsAService

13 upvotes·2.7K views

Decision at LaunchDarkly about Kafka, Amazon Kinesis, Redis, Amazon EC2, Amazon ElastiCache, Consul, Patroni, TimescaleDB, PostgreSQL, Amazon RDS

Avatar of jkodumal
CTO at LaunchDarkly ·
KafkaKafka
Amazon KinesisAmazon Kinesis
RedisRedis
Amazon EC2Amazon EC2
Amazon ElastiCacheAmazon ElastiCache
ConsulConsul
PatroniPatroni
TimescaleDBTimescaleDB
PostgreSQLPostgreSQL
Amazon RDSAmazon RDS

As we've evolved or added additional infrastructure to our stack, we've biased towards managed services. Most new backing stores are Amazon RDS instances now. We do use self-managed PostgreSQL with TimescaleDB for time-series data—this is made HA with the use of Patroni and Consul.

We also use managed Amazon ElastiCache instances instead of spinning up Amazon EC2 instances to run Redis workloads, as well as shifting to Amazon Kinesis instead of Kafka.

13 upvotes·2.2K views

Decision at Zulip about Elasticsearch, MySQL, PostgreSQL

Avatar of tabbott
Founder at Zulip ·
ElasticsearchElasticsearch
MySQLMySQL
PostgreSQLPostgreSQL

We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks.

We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).

And then after that, we've just gotten a ton of value out of postgres. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes.

I can't recommend it highly enough.

12 upvotes·20.7K views