Alternatives to Azure Cosmos DB logo

Alternatives to Azure Cosmos DB

Azure SQL Database, MongoDB Atlas, MongoDB, Neo4j, and MySQL are the most popular alternatives and competitors to Azure Cosmos DB.
384
717
+ 1
113

What is Azure Cosmos DB and what are its top alternatives?

Azure DocumentDB is a fully managed NoSQL database service built for fast and predictable performance, high availability, elastic scaling, global distribution, and ease of development.
Azure Cosmos DB is a tool in the NoSQL Database as a Service category of a tech stack.

Top Alternatives to Azure Cosmos DB

  • Azure SQL Database

    Azure SQL Database

    It is the intelligent, scalable, cloud database service that provides the broadest SQL Server engine compatibility and up to a 212% return on investment. It is a database service that can quickly and efficiently scale to meet demand, is automatically highly available, and supports a variety of third party software. ...

  • MongoDB Atlas

    MongoDB Atlas

    MongoDB Atlas is a global cloud database service built and run by the team behind MongoDB. Enjoy the flexibility and scalability of a document database, with the ease and automation of a fully managed service on your preferred cloud. ...

  • MongoDB

    MongoDB

    MongoDB stores data in JSON-like documents that can vary in structure, offering a dynamic, flexible schema. MongoDB was also designed for high availability and scalability, with built-in replication and auto-sharding. ...

  • Neo4j

    Neo4j

    Neo4j stores data in nodes connected by directed, typed relationships with properties on both, also known as a Property Graph. It is a high performance graph store with all the features expected of a mature and robust database, like a friendly query language and ACID transactions. ...

  • MySQL

    MySQL

    The MySQL software delivers a very fast, multi-threaded, multi-user, and robust SQL (Structured Query Language) database server. MySQL Server is intended for mission-critical, heavy-load production systems as well as for embedding into mass-deployed software. ...

  • Cassandra

    Cassandra

    Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster. Row store means that like relational databases, Cassandra organizes data by rows and columns. The Cassandra Query Language (CQL) is a close relative of SQL. ...

  • PostgreSQL

    PostgreSQL

    PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. ...

  • Amazon DynamoDB

    Amazon DynamoDB

    With it , you can offload the administrative burden of operating and scaling a highly available distributed database cluster, while paying a low price for only what you use. ...

Azure Cosmos DB alternatives & related posts

Azure SQL Database logo

Azure SQL Database

292
271
7
Managed, intelligent SQL in the cloud
292
271
+ 1
7
PROS OF AZURE SQL DATABASE
  • 3
    Managed
  • 2
    Scalable
  • 2
    Secure
CONS OF AZURE SQL DATABASE
    Be the first to leave a con

    related Azure SQL Database posts

    MongoDB Atlas logo

    MongoDB Atlas

    567
    620
    28
    Deploy and scale a MongoDB cluster in the cloud with just a few clicks
    567
    620
    + 1
    28
    PROS OF MONGODB ATLAS
    • 9
      MongoDB SaaS for and by Mongo, makes it so easy
    • 5
      Amazon VPC peering
    • 4
      MongoDB atlas is GUItool through you can manage all DB
    • 3
      Granular role-based access controls
    • 3
      Use it anywhere
    • 3
      Built-in data browser
    • 1
      Simple and easy to integrate
    CONS OF MONGODB ATLAS
      Be the first to leave a con

      related MongoDB Atlas posts

      Repost

      Overview: To put it simply, we plan to use the MERN stack to build our web application. MongoDB will be used as our primary database. We will use ExpressJS alongside Node.js to set up our API endpoints. Additionally, we plan to use React to build our SPA on the client side and use Redis on the server side as our primary caching solution. Initially, while working on the project, we plan to deploy our server and client both on Heroku . However, Heroku is very limited and we will need the benefits of an Infrastructure as a Service so we will use Amazon EC2 to later deploy our final version of the application.

      Serverside: nodemon will allow us to automatically restart a running instance of our node app when files changes take place. We decided to use MongoDB because it is a non relational database which uses the Document Object Model. This allows a lot of flexibility as compared to a RDMS like SQL which requires a very structural model of data that does not change too much. Another strength of MongoDB is its ease in scalability. We will use Mongoose along side MongoDB to model our application data. Additionally, we will host our MongoDB cluster remotely on MongoDB Atlas. Bcrypt will be used to encrypt user passwords that will be stored in the DB. This is to avoid the risks of storing plain text passwords. Moreover, we will use Cloudinary to store images uploaded by the user. We will also use the Twilio SendGrid API to enable automated emails sent by our application. To protect private API endpoints, we will use JSON Web Token and Passport. Also, PayPal will be used as a payment gateway to accept payments from users.

      Client Side: As mentioned earlier, we will use React to build our SPA. React uses a virtual DOM which is very efficient in rendering a page. Also React will allow us to reuse components. Furthermore, it is very popular and there is a large community that uses React so it can be helpful if we run into issues. We also plan to make a cross platform mobile application later and using React will allow us to reuse a lot of our code with React Native. Redux will be used to manage state. Redux works great with React and will help us manage a global state in the app and avoid the complications of each component having its own state. Additionally, we will use Bootstrap components and custom CSS to style our app.

      Other: Git will be used for version control. During the later stages of our project, we will use Google Analytics to collect useful data regarding user interactions. Moreover, Slack will be our primary communication tool. Also, we will use Visual Studio Code as our primary code editor because it is very light weight and has a wide variety of extensions that will boost productivity. Postman will be used to interact with and debug our API endpoints.

      See more
      Gregory Koberger

      We went with MongoDB , almost by mistake. I had never used it before, but I knew I wanted the *EAN part of the MEAN stack, so why not go all in. I come from a background of SQL (first MySQL , then PostgreSQL ), so I definitely abused Mongo at first... by trying to turn it into something more relational than it should be. But hey, data is supposed to be relational, so there wasn't really any way to get around that.

      There's a lot I love about MongoDB, and a lot I hate. I still don't know if we made the right decision. We've been able to build much quicker, but we also have had some growing pains. We host our databases on MongoDB Atlas , and I can't say enough good things about it. We had tried MongoLab and Compose before it, and with MongoDB Atlas I finally feel like things are in a good place. I don't know if I'd use it for a one-off small project, but for a large product Atlas has given us a ton more control, stability and trust.

      See more
      MongoDB logo

      MongoDB

      51.6K
      41.2K
      4K
      The database for giant ideas
      51.6K
      41.2K
      + 1
      4K
      PROS OF MONGODB
      • 822
        Document-oriented storage
      • 585
        No sql
      • 544
        Ease of use
      • 462
        Fast
      • 404
        High performance
      • 251
        Free
      • 212
        Open source
      • 177
        Flexible
      • 139
        Replication & high availability
      • 107
        Easy to maintain
      • 39
        Querying
      • 35
        Easy scalability
      • 34
        Auto-sharding
      • 33
        High availability
      • 29
        Map/reduce
      • 26
        Document database
      • 24
        Easy setup
      • 24
        Full index support
      • 15
        Reliable
      • 14
        Fast in-place updates
      • 13
        Agile programming, flexible, fast
      • 11
        No database migrations
      • 7
        Enterprise
      • 7
        Easy integration with Node.Js
      • 5
        Enterprise Support
      • 4
        Great NoSQL DB
      • 3
        Aggregation Framework
      • 3
        Drivers support is good
      • 3
        Support for many languages through different drivers
      • 2
        Schemaless
      • 2
        Managed service
      • 2
        Easy to Scale
      • 2
        Fast
      • 2
        Awesome
      • 1
        Consistent
      CONS OF MONGODB
      • 5
        Very slowly for connected models that require joins
      • 3
        Not acid compliant
      • 1
        Proprietary query language

      related MongoDB posts

      Jeyabalaji Subramanian

      Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

      We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

      Based on the above criteria, we selected the following tools to perform the end to end data replication:

      We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

      We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

      In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

      Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

      In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

      See more
      Robert Zuber

      We use MongoDB as our primary #datastore. Mongo's approach to replica sets enables some fantastic patterns for operations like maintenance, backups, and #ETL.

      As we pull #microservices from our #monolith, we are taking the opportunity to build them with their own datastores using PostgreSQL. We also use Redis to cache data we’d never store permanently, and to rate-limit our requests to partners’ APIs (like GitHub).

      When we’re dealing with large blobs of immutable data (logs, artifacts, and test results), we store them in Amazon S3. We handle any side-effects of S3’s eventual consistency model within our own code. This ensures that we deal with user requests correctly while writes are in process.

      See more
      Neo4j logo

      Neo4j

      868
      945
      320
      The world’s leading Graph Database
      868
      945
      + 1
      320
      PROS OF NEO4J
      • 65
        Cypher – graph query language
      • 55
        Great graphdb
      • 31
        Open source
      • 29
        Rest api
      • 27
        High-Performance Native API
      • 21
        ACID
      • 19
        Easy setup
      • 14
        Great support
      • 10
        Clustering
      • 8
        Hot Backups
      • 7
        Powerful, flexible data model
      • 7
        Great Web Admin UI
      • 5
        Embeddable
      • 5
        Mature
      • 4
        Easy to Use and Model
      • 3
        Highly-available
      • 3
        Best Graphdb
      • 2
        It's awesome, I wanted to try it
      • 2
        Used by Crunchbase
      • 2
        Great query language and built in data browser
      • 1
        Great onboarding process
      CONS OF NEO4J
      • 4
        Can't store a vertex as JSON
      • 3
        Comparably slow

      related Neo4j posts

      We have an in-house build experiment management system. We produce samples as input to the next step, which then could produce 1 sample(1-1) and many samples (1 - many). There are many steps like this. So far, we are tracking genealogy (limited tracking) in the MySQL database, which is becoming hard to trace back to the original material or sample(I can give more details if required). So, we are considering a Graph database. I am requesting advice from the experts.

      1. Is a graph database the right choice, or can we manage with RDBMS?
      2. If RDBMS, which RDMS, which feature, or which approach could make this manageable or sustainable
      3. If Graph database(Neo4j, OrientDB, Azure Cosmos DB, Amazon Neptune, ArangoDB), which one is good, and what are the best practices?

      I am sorry that this might be a loaded question.

      See more

      I'm evaluating the use of RedisGraph vs Microsoft SQL Server 2019 graph features to build a social graph. One of the key criteria is high availability and cross data center replication of data. While Neo4j is a much-matured solution in general, I'm not accounting for it due to the cost & introduction of a new stack in the ecosystem. Also, due to the nature of data & org policies, using a cloud-based solution won't be a viable choice.

      We currently use Redis as a cache & SQL server 2019 as RDBMS.

      I'm inclining towards SQL server 2019 graph as we already use SQL server extensively as relational database & have all the HA and cross data center replication setup readily available. I still need to evaluate if it fulfills our need as a graph DB though, I also learned that SQL server 2019 is still a new player in the market and attempts to fit a graph-like query on top of a relational model (with node and edge tables). RedisGraph seems very promising. However, I'm not totally sure about HA, Graph data backup, cross-data center support.

      See more
      MySQL logo

      MySQL

      68.3K
      52.7K
      3.7K
      The world's most popular open source database
      68.3K
      52.7K
      + 1
      3.7K
      PROS OF MYSQL
      • 789
        Sql
      • 674
        Free
      • 557
        Easy
      • 527
        Widely used
      • 485
        Open source
      • 180
        High availability
      • 158
        Cross-platform support
      • 103
        Great community
      • 77
        Secure
      • 75
        Full-text indexing and searching
      • 25
        Fast, open, available
      • 14
        SSL support
      • 13
        Robust
      • 13
        Reliable
      • 8
        Enterprise Version
      • 7
        Easy to set up on all platforms
      • 1
        Easy, light, scalable
      • 1
        Relational database
      • 1
        NoSQL access to JSON data type
      • 1
        Sequel Pro (best SQL GUI)
      • 1
        Replica Support
      CONS OF MYSQL
      • 13
        Owned by a company with their own agenda
      • 1
        Can't roll back schema changes

      related MySQL posts

      Tim Abbott

      We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks.

      We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).

      And then after that, we've just gotten a ton of value out of postgres. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes.

      I can't recommend it highly enough.

      See more
      Conor Myhrvold
      Tech Brand Mgr, Office of CTO at Uber · | 20 upvotes · 915.7K views

      Our most popular (& controversial!) article to date on the Uber Engineering blog in 3+ yrs. Why we moved from PostgreSQL to MySQL. In essence, it was due to a variety of limitations of Postgres at the time. Fun fact -- earlier in Uber's history we'd actually moved from MySQL to Postgres before switching back for good, & though we published the article in Summer 2016 we haven't looked back since:

      The early architecture of Uber consisted of a monolithic backend application written in Python that used Postgres for data persistence. Since that time, the architecture of Uber has changed significantly, to a model of microservices and new data platforms. Specifically, in many of the cases where we previously used Postgres, we now use Schemaless, a novel database sharding layer built on top of MySQL (https://eng.uber.com/schemaless-part-one/). In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL:

      https://eng.uber.com/mysql-migration/

      See more
      Cassandra logo

      Cassandra

      3K
      2.9K
      463
      A partitioned row store. Rows are organized into tables with a required primary key.
      3K
      2.9K
      + 1
      463
      PROS OF CASSANDRA
      • 107
        Distributed
      • 90
        High performance
      • 77
        High availability
      • 71
        Easy scalability
      • 50
        Replication
      • 25
        Reliable
      • 24
        Multi datacenter deployments
      • 6
        Schema optional
      • 6
        OLTP
      • 5
        Open source
      • 2
        Workload separation (via MDC)
      CONS OF CASSANDRA
      • 1
        Reliability of replication
      • 1
        Updates

      related Cassandra posts

      Thierry Schellenbach
      Shared insights
      on
      RedisRedisCassandraCassandraRocksDBRocksDB
      at

      1.0 of Stream leveraged Cassandra for storing the feed. Cassandra is a common choice for building feeds. Instagram, for instance started, out with Redis but eventually switched to Cassandra to handle their rapid usage growth. Cassandra can handle write heavy workloads very efficiently.

      Cassandra is a great tool that allows you to scale write capacity simply by adding more nodes, though it is also very complex. This complexity made it hard to diagnose performance fluctuations. Even though we had years of experience with running Cassandra, it still felt like a bit of a black box. When building Stream 2.0 we decided to go for a different approach and build Keevo. Keevo is our in-house key-value store built upon RocksDB, gRPC and Raft.

      RocksDB is a highly performant embeddable database library developed and maintained by Facebook’s data engineering team. RocksDB started as a fork of Google’s LevelDB that introduced several performance improvements for SSD. Nowadays RocksDB is a project on its own and is under active development. It is written in C++ and it’s fast. Have a look at how this benchmark handles 7 million QPS. In terms of technology it’s much more simple than Cassandra.

      This translates into reduced maintenance overhead, improved performance and, most importantly, more consistent performance. It’s interesting to note that LinkedIn also uses RocksDB for their feed.

      #InMemoryDatabases #DataStores #Databases

      See more
      Umair Iftikhar
      Technical Architect at Vappar · | 3 upvotes · 12.3K views

      Developing a solution that collects Telemetry Data from different devices, nearly 1000 devices minimum and maximum 12000. Each device is sending 2 packets in 1 second. This is time-series data, and this data definition and different reports are saved on PostgreSQL. Like Building information, maintenance records, etc. I want to know about the best solution. This data is required for Math and ML to run different algorithms. Also, data is raw without definitions and information stored in PostgreSQL. Initially, I went with TimescaleDB due to PostgreSQL support, but to increase in sites, I started facing many issues with timescale DB in terms of flexibility of storing data.

      My major requirement is also the replication of the database for reporting and different purposes. You may also suggest other options other than Druid and Cassandra. But an open source solution is appreciated.

      See more
      PostgreSQL logo

      PostgreSQL

      51.6K
      39.8K
      3.5K
      A powerful, open source object-relational database system
      51.6K
      39.8K
      + 1
      3.5K
      PROS OF POSTGRESQL
      • 755
        Relational database
      • 506
        High availability
      • 437
        Enterprise class database
      • 379
        Sql
      • 299
        Sql + nosql
      • 171
        Great community
      • 145
        Easy to setup
      • 129
        Heroku
      • 128
        Secure by default
      • 111
        Postgis
      • 48
        Supports Key-Value
      • 46
        Great JSON support
      • 32
        Cross platform
      • 29
        Extensible
      • 25
        Replication
      • 24
        Triggers
      • 22
        Rollback
      • 21
        Multiversion concurrency control
      • 20
        Open source
      • 17
        Heroku Add-on
      • 14
        Stable, Simple and Good Performance
      • 13
        Powerful
      • 12
        Lets be serious, what other SQL DB would you go for?
      • 9
        Good documentation
      • 7
        Scalable
      • 7
        Intelligent optimizer
      • 6
        Transactional DDL
      • 6
        Modern
      • 6
        Reliable
      • 5
        One stop solution for all things sql no matter the os
      • 5
        Free
      • 4
        Relational database with MVCC
      • 3
        Full-Text Search
      • 3
        Developer friendly
      • 3
        Faster Development
      • 2
        Excellent source code
      • 2
        Great DB for Transactional system or Application
      • 1
        Free version
      • 1
        Text
      • 1
        Open-source
      • 1
        search
      • 1
        Full-text
      CONS OF POSTGRESQL
      • 9
        Table/index bloatings

      related PostgreSQL posts

      Jeyabalaji Subramanian

      Recently we were looking at a few robust and cost-effective ways of replicating the data that resides in our production MongoDB to a PostgreSQL database for data warehousing and business intelligence.

      We set ourselves the following criteria for the optimal tool that would do this job: - The data replication must be near real-time, yet it should NOT impact the production database - The data replication must be horizontally scalable (based on the load), asynchronous & crash-resilient

      Based on the above criteria, we selected the following tools to perform the end to end data replication:

      We chose MongoDB Stitch for picking up the changes in the source database. It is the serverless platform from MongoDB. One of the services offered by MongoDB Stitch is Stitch Triggers. Using stitch triggers, you can execute a serverless function (in Node.js) in real time in response to changes in the database. When there are a lot of database changes, Stitch automatically "feeds forward" these changes through an asynchronous queue.

      We chose Amazon SQS as the pipe / message backbone for communicating the changes from MongoDB to our own replication service. Interestingly enough, MongoDB stitch offers integration with AWS services.

      In the Node.js function, we wrote minimal functionality to communicate the database changes (insert / update / delete / replace) to Amazon SQS.

      Next we wrote a minimal micro-service in Python to listen to the message events on SQS, pickup the data payload & mirror the DB changes on to the target Data warehouse. We implemented source data to target data translation by modelling target table structures through SQLAlchemy . We deployed this micro-service as AWS Lambda with Zappa. With Zappa, deploying your services as event-driven & horizontally scalable Lambda service is dumb-easy.

      In the end, we got to implement a highly scalable near realtime Change Data Replication service that "works" and deployed to production in a matter of few days!

      See more
      Tim Abbott

      We've been using PostgreSQL since the very early days of Zulip, but we actually didn't use it from the beginning. Zulip started out as a MySQL project back in 2012, because we'd heard it was a good choice for a startup with a wide community. However, we found that even though we were using the Django ORM for most of our database access, we spent a lot of time fighting with MySQL. Issues ranged from bad collation defaults, to bad query plans which required a lot of manual query tweaks.

      We ended up getting so frustrated that we tried out PostgresQL, and the results were fantastic. We didn't have to do any real customization (just some tuning settings for how big a server we had), and all of our most important queries were faster out of the box. As a result, we were able to delete a bunch of custom queries escaping the ORM that we'd written to make the MySQL query planner happy (because postgres just did the right thing automatically).

      And then after that, we've just gotten a ton of value out of postgres. We use its excellent built-in full-text search, which has helped us avoid needing to bring in a tool like Elasticsearch, and we've really enjoyed features like its partial indexes, which saved us a lot of work adding unnecessary extra tables to get good performance for things like our "unread messages" and "starred messages" indexes.

      I can't recommend it highly enough.

      See more
      Amazon DynamoDB logo

      Amazon DynamoDB

      2.9K
      2.5K
      195
      Fully managed NoSQL database service
      2.9K
      2.5K
      + 1
      195
      PROS OF AMAZON DYNAMODB
      • 62
        Predictable performance and cost
      • 56
        Scalable
      • 35
        Native JSON Support
      • 21
        AWS Free Tier
      • 7
        Fast
      • 3
        No sql
      • 3
        To store data
      • 2
        Serverless
      • 2
        No Stored procedures is GOOD
      • 1
        ORM with DynamoDBMapper
      • 1
        Elastic Scalability using on-demand mode
      • 1
        Elastic Scalability using autoscaling
      • 1
        DynamoDB Stream
      CONS OF AMAZON DYNAMODB
      • 3
        Only sequential access for paginate data

      related Amazon DynamoDB posts

      Julien DeFrance
      Principal Software Engineer at Tophatter · | 16 upvotes · 2.2M views

      Back in 2014, I was given an opportunity to re-architect SmartZip Analytics platform, and flagship product: SmartTargeting. This is a SaaS software helping real estate professionals keeping up with their prospects and leads in a given neighborhood/territory, finding out (thanks to predictive analytics) who's the most likely to list/sell their home, and running cross-channel marketing automation against them: direct mail, online ads, email... The company also does provide Data APIs to Enterprise customers.

      I had inherited years and years of technical debt and I knew things had to change radically. The first enabler to this was to make use of the cloud and go with AWS, so we would stop re-inventing the wheel, and build around managed/scalable services.

      For the SaaS product, we kept on working with Rails as this was what my team had the most knowledge in. We've however broken up the monolith and decoupled the front-end application from the backend thanks to the use of Rails API so we'd get independently scalable micro-services from now on.

      Our various applications could now be deployed using AWS Elastic Beanstalk so we wouldn't waste any more efforts writing time-consuming Capistrano deployment scripts for instance. Combined with Docker so our application would run within its own container, independently from the underlying host configuration.

      Storage-wise, we went with Amazon S3 and ditched any pre-existing local or network storage people used to deal with in our legacy systems. On the database side: Amazon RDS / MySQL initially. Ultimately migrated to Amazon RDS for Aurora / MySQL when it got released. Once again, here you need a managed service your cloud provider handles for you.

      Future improvements / technology decisions included:

      Caching: Amazon ElastiCache / Memcached CDN: Amazon CloudFront Systems Integration: Segment / Zapier Data-warehousing: Amazon Redshift BI: Amazon Quicksight / Superset Search: Elasticsearch / Amazon Elasticsearch Service / Algolia Monitoring: New Relic

      As our usage grows, patterns changed, and/or our business needs evolved, my role as Engineering Manager then Director of Engineering was also to ensure my team kept on learning and innovating, while delivering on business value.

      One of these innovations was to get ourselves into Serverless : Adopting AWS Lambda was a big step forward. At the time, only available for Node.js (Not Ruby ) but a great way to handle cost efficiency, unpredictable traffic, sudden bursts of traffic... Ultimately you want the whole chain of services involved in a call to be serverless, and that's when we've started leveraging Amazon DynamoDB on these projects so they'd be fully scalable.

      See more
      Dmitry Mukhin

      Uploadcare has built an infinitely scalable infrastructure by leveraging AWS. Building on top of AWS allows us to process 350M daily requests for file uploads, manipulations, and deliveries. When we started in 2011 the only cloud alternative to AWS was Google App Engine which was a no-go for a rather complex solution we wanted to build. We also didn’t want to buy any hardware or use co-locations.

      Our stack handles receiving files, communicating with external file sources, managing file storage, managing user and file data, processing files, file caching and delivery, and managing user interface dashboards.

      At its core, Uploadcare runs on Python. The Europython 2011 conference in Florence really inspired us, coupled with the fact that it was general enough to solve all of our challenges informed this decision. Additionally we had prior experience working in Python.

      We chose to build the main application with Django because of its feature completeness and large footprint within the Python ecosystem.

      All the communications within our ecosystem occur via several HTTP APIs, Redis, Amazon S3, and Amazon DynamoDB. We decided on this architecture so that our our system could be scalable in terms of storage and database throughput. This way we only need Django running on top of our database cluster. We use PostgreSQL as our database because it is considered an industry standard when it comes to clustering and scaling.

      See more