In mid-2018 we made a big push for speed on the site. The site, running on PHP, was taking about 7 seconds to load. The site had already been running through CloudFlare for some time but on a shared host in Sydney (which is also where most of the customers are). We found when developing the @TuffTruck site that DigitalOcean was fast - and even though it's located overseas, we still found it 2 seconds faster for Australian users. We found that some Wordpress plugins were really slowing the TTFB - with all plugins off, Wordpress would save respond 1.5-2 seconds faster. With a on/off walk through of each plugin we found 2 plugins by Ontraport (a CRM type service that some forms we populating) was the main culprit. Out they went and we built our own plugin to do push the data to them only when required. With the TTFB acceptable, we moved on to getting the complete page load time down. Turning on CloudFlare 's HTML/CSS/JS minifications & Rocket Loader we could get our group of test pages, including the homepage, loading [in full] in just over 2 seconds. We then moved the images off to imgix and put the CSS, JS and Fonts onto a mirrored subdomain (so that cookies weren't exchanged), but this only shaved about another 0.2 seconds off. We are keeping it running for the moment, but the $10 minimum a month for imgix is hardly worth it (would be different if new images were going up all the time and needed processing). The client is overly happy with the ~70% improvement and has already seen the site move up the ranks of Google's SERP and bring down their PPC costs. AND all the new hosting providers still come in at half the price of the previous Sydney hosting service. We have a few ideas that we are testing on our staging site and will roll these out soon.
We use AWS mostly because Amazon RDS for PostgreSQL is a very good and cheap solution to get a managed PostgreSQL.
We also use Amazon EC2 for the servers, with the Arch Linux images from Uplink Labs. We chose Arch Linux because of its up-to-date packages, especially for Python. We use Pacman to package and deploy our services.
When it comes to continuous Integration services, the choice is hard. There are several solutions available and it looks like the dev scene is very split. We've read and reviewed several solutions and we ended up making the choice between Codeship and Semaphore . Although Semaphore is used by slightly more developers, we've experienced a faster and easy flow using Codeship. Both do integrate Slack and GitHub very well, so this is not a point to set them apart. Both have a complex pricing system that is not that easy to calculate and predict. However, out in the wild, we found Codeship to have a better price point at heavy use.
The 350M API requests we handle daily include many processing tasks such as image enhancements, resizing, filtering, face recognition, and GIF to video conversions.
Tornado is the one we currently use and aiohttp is the one we intend to implement in production in the near future. Both tools support handling huge amounts of requests but aiohttp is preferable as it uses asyncio which is Python-native. Since Python is in the heart of our service, we initially used PIL followed by Pillow. We kind of still do. When we figured resizing was the most taxing processing operation, Alex, our engineer, created the fork named Pillow-SIMD and implemented a good number of optimizations into it to make it 15 times faster than ImageMagick
Thanks to the optimizations, Uploadcare now needs six times fewer servers to process images. Here, by servers I also mean separate Amazon EC2 instances handling processing and the first layer of caching. The processing instances are also paired with AWS Elastic Load Balancing (ELB) which helps ingest files to the CDN.
SQLite is a tricky beast. It's great if you're working single-threaded, but a Terrible Idea if you've got more than one concurrent connection. You use it because it's easy to setup, light, and portable (it's just a file).
In Paperless, we've built a self-hosted web application, so it makes sense to standardise on something small & light, and as we don't have to worry about multiple connections (it's just you using the app), it's a perfect fit.
For users wanting to scale Paperless up to a multi-user environment though, we do provide the hooks to switch to PostgreSQL .