The Paperless Project

Decision at The Paperless Project about Docker

Avatar of danielquinn
Senior Developer at Founders4Schools

We use Docker because Paperless is a stand-alone application with some complicated dependencies. Rather than expecting our users to figure out how to install & setup Tesseract to run on their local systems (if said systems even have support for it), we can just tell them to run docker-compose up and everything else is just magic!

14 upvotes1.1K views

Decision at The Paperless Project about Django, Tesseract OCR, Python

Avatar of danielquinn
Senior Developer at Founders4Schools

I use Python because it's a beautiful (both visually and in terms of function) and multi-purpose language. In Paperless, Python is the primary connecting tissue holding all of the parts together: it's the basis of the consumption engine (communicating with Tesseract OCR via pyOCR) and the user-interface (based on Django).

6 upvotes993 views

Decision at The Paperless Project about Django

Avatar of danielquinn
Senior Developer at Founders4Schools

Django is an amazing web framework that comes with everything you need, and makes it easy to turn off the stuff you don't.

In the case of Paperless, I needed a simple way to build a site that had a pre-built "CRUD" interface, and that's the Django admin. Django also supports things like management commands and signals, which we use throughout the project along with its built-in testing framework.

5 upvotes414 views

Decision at The Paperless Project about Tesseract OCR

Avatar of danielquinn
Senior Developer at Founders4Schools

I needed a tool that could convert a rasterised image into text. There are a few out there, but I don't think there's any that match Tesseract OCR for cross-language capability, community support and freedom (it's Free as in freedom and beer).

The setup isn't super-obvious, but once you've got it figured out, all of that can be automated. On top of that, there's lots of programming language-specific libraries out there that'll help plug your stuff into it.

5 upvotes254 views

Decision at The Paperless Project about Travis CI

Avatar of danielquinn
Senior Developer at Founders4Schools

I decided on Travis CI because its required permissions were reasonable. Where most Github apps require insane stuff like write access to all repos, public & private, Travis only needed a webhook setup.

On top of that, the interface is slick and easy to follow and their support for Free projects is free :-)

5 upvotes147 views

Decision at The Paperless Project about GitLab, GitHub

Avatar of danielquinn
Senior Developer at Founders4Schools

We use GitHub because it's the default go-to place for the Free software community. Currently, Github is enjoying the network effect: you write code there because everyone writes there code there, so this choice was less of a choice than "what we all end up doing".

Personally, I prefer GitLab for its bundled-in tools like CI, boards, packaging, and Docker repo, but so long as the vast majority of talented nerds out there are on Github, that's where Paperless will be.

3 upvotes5.4K views

Decision at The Paperless Project about PostgreSQL, SQLite

Avatar of danielquinn
Senior Developer at Founders4Schools

SQLite is a tricky beast. It's great if you're working single-threaded, but a Terrible Idea if you've got more than one concurrent connection. You use it because it's easy to setup, light, and portable (it's just a file).

In Paperless, we've built a self-hosted web application, so it makes sense to standardise on something small & light, and as we don't have to worry about multiple connections (it's just you using the app), it's a perfect fit.

For users wanting to scale Paperless up to a multi-user environment though, we do provide the hooks to switch to PostgreSQL .

2 upvotes589 views

Decision at The Paperless Project about Alpine Linux

Avatar of danielquinn
Senior Developer at Founders4Schools

Alpine Linux is what you use when you care about disk space and not so much about features. As Paperless is using Docker to run its various components: (read: single-purpose virtual machines), Alpine makes perfect sense as each component can be built custom and stripped down to just what we need.

1 upvote14 views

Decision at The Paperless Project about Sphinx

Avatar of danielquinn
Senior Developer at Founders4Schools

We use Sphinx because it's the standard for Python documentation. As Paperless is Python-based, this only made sense.

Additionally, readthedocs.org plays very well with it, so that's your documentation hosting for you for free.

Finally, it supports ReStructuredText, which is amazingly powerful.

1 upvote14 views

Decision at The Paperless Project about Git

Avatar of danielquinn
Senior Developer at Founders4Schools

We use Git because it's the defacto way to share & collaboratively write code. It's leaps-and-bounds ahead of the technologies it replaced like Subversion & CVS in that it's truly decentralised and easy to pick & share bits of code across branches.

1 upvote4 views