Picking MongoDB

Mar 20, 2020

Needs advice

and

We Have thousands of .pdf docs generated from the same form but with lots of variability. We need to extract data from open text and more important - from tables inside the docs. The output of Couchbase/Mongo will be one row per document for backend processing. ADOBE renders the tables in an unusable form.

READ LESS

9 upvotes·239.4K views

Replies (3)

Petr Havlicek

Freelancer at havlicekpetr.cz·Mar 21, 2020

Recommends

MongoDB

I prefer MongoDB due to own experience with migration of old archive of pdf and meta-data to a new “archive”. The biggest advantage is speed of filters output - a new archive is way faster and reliable then the old one - but also the the easy programming of MongoDB with many code snippets and examples available. I have no personal experience so far with Couchbase. From the architecture point of view both options are OK - go for the one you like.

12 upvotes·232K views

Ivan Begtin

Founder - Dateno, Director - NGO "Informational Culture" / Ambassador - OKFN Armenia at Infoculture·Mar 23, 2020

Recommends

ArangoDB

I would like to suggest MongoDB or ArangoDB (can't choose both, so ArangoDB). MongoDB is more mature, but ArangoDB is more interesting if you will need to bring graph database ideas to solution. For example if some data or some documents are interlinked, then probably ArangoDB is a best solution.

To process tables we used Abbyy software stack. It's great on table extraction.

7 upvotes·232.1K views

View all (3)