Paperless-ng automated image pdf OCR and filing on docker
I would have loved to have this application available a decade ago when bills were still coming in print form and I used to scan them and file them, along with important documents. Nowadays, everything is digital, thankfully, and I rarely need OCR on my documents, nonetheless I still need to redo the OCR on my old documents from a decade or more ago, since those were heavily botched, and it's nearly impossible to search for anything in them.
This is where Paperless-ng from Jonas Winkler shines. You docker-compose your way to it, edit a few lines of config, and fire away.
The workflow of this app is as follows: You scan, either with the phone app, with a proper scanner or any other means, and you drop that file either in a specific folder or on the web GUI and Paperless-ng takes over where it does OCR and automatic filling of your PDF (with images) document.
The automatic part needs some input at first in order for the app to give tags and file your newly scanned documents accordingly.
Paperless-ng is a fork of Paperless, giving it a better GUI and the OCR part is flawless thanks to Tesseract.