The Paperless Project
Avatar of danielquinn
Senior Developer at Workfinder

We use Docker because Paperless is a stand-alone application with some complicated dependencies. Rather than expecting our users to figure out how to install & setup Tesseract to run on their local systems (if said systems even have support for it), we can just tell them to run docker-compose up and everything else is just magic!

READ MORE
14 upvotes3.4K views
Avatar of danielquinn
Senior Developer at Workfinder

I decided on Travis CI because its required permissions were reasonable. Where most Github apps require insane stuff like write access to all repos, public & private, Travis only needed a webhook setup.

On top of that, the interface is slick and easy to follow and their support for Free projects is free :-)

READ MORE
7 upvotes10.4K views
Avatar of danielquinn
Senior Developer at Workfinder

I use Python because it's a beautiful (both visually and in terms of function) and multi-purpose language. In Paperless, Python is the primary connecting tissue holding all of the parts together: it's the basis of the consumption engine (communicating with Tesseract OCR via pyOCR) and the user-interface (based on Django).

READ MORE
7 upvotes5.3K views
Avatar of danielquinn
Senior Developer at Workfinder

We use GitHub because it's the default go-to place for the Free software community. Currently, Github is enjoying the network effect: you write code there because everyone writes there code there, so this choice was less of a choice than "what we all end up doing".

Personally, I prefer GitLab for its bundled-in tools like CI, boards, packaging, and Docker repo, but so long as the vast majority of talented nerds out there are on Github, that's where Paperless will be.

READ MORE
6 upvotes7.4K views
Avatar of danielquinn
Senior Developer at Workfinder

I needed a tool that could convert a rasterised image into text. There are a few out there, but I don't think there's any that match Tesseract OCR for cross-language capability, community support and freedom (it's Free as in freedom and beer).

The setup isn't super-obvious, but once you've got it figured out, all of that can be automated. On top of that, there's lots of programming language-specific libraries out there that'll help plug your stuff into it.

READ MORE
5 upvotes3.3K views
Avatar of danielquinn
Senior Developer at Workfinder

Django is an amazing web framework that comes with everything you need, and makes it easy to turn off the stuff you don't.

In the case of Paperless, I needed a simple way to build a site that had a pre-built "CRUD" interface, and that's the Django admin. Django also supports things like management commands and signals, which we use throughout the project along with its built-in testing framework.

READ MORE
5 upvotes1.5K views
Avatar of danielquinn
Senior Developer at Workfinder

SQLite is a tricky beast. It's great if you're working single-threaded, but a Terrible Idea if you've got more than one concurrent connection. You use it because it's easy to setup, light, and portable (it's just a file).

In Paperless, we've built a self-hosted web application, so it makes sense to standardise on something small & light, and as we don't have to worry about multiple connections (it's just you using the app), it's a perfect fit.

For users wanting to scale Paperless up to a multi-user environment though, we do provide the hooks to switch to PostgreSQL .

READ MORE
2 upvotes34.1K views
Avatar of danielquinn
Senior Developer at Workfinder

Alpine Linux is what you use when you care about disk space and not so much about features. As Paperless is using Docker to run its various components: (read: single-purpose virtual machines), Alpine makes perfect sense as each component can be built custom and stripped down to just what we need.

READ MORE
2 upvotes3.8K views
Avatar of danielquinn
Senior Developer at Workfinder

We use Sphinx because it's the standard for Python documentation. As Paperless is Python-based, this only made sense.

Additionally, readthedocs.org plays very well with it, so that's your documentation hosting for you for free.

Finally, it supports ReStructuredText, which is amazingly powerful.

READ MORE
1 upvote2.8K views
Avatar of danielquinn
Senior Developer at Workfinder

We use Git because it's the defacto way to share & collaboratively write code. It's leaps-and-bounds ahead of the technologies it replaced like Subversion & CVS in that it's truly decentralised and easy to pick & share bits of code across branches.

READ MORE
1 upvote78 views