bell notificationshomepageloginedit profileclubsdmBox

10.01% popularity   0 Reactions

Is there a set of standard best practices for compressing and adding text recognition to a scan of a book chapter? I'd prefer open source tools, but I'm hoping there is an existing script that reasonably does these two functions to bring PDFs to reasonable size and usefulness (hopefully in batch)


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (1)

Login to follow story

More posts by @Debbie

1 Comments

Sorted by latest first Latest Oldest Best

 

@James

10% popularity   0 Reactions

The answer really depends on your needs, time you might be willing to invest, money you might be willing to spend (for commercial software), and what you are going to do with the result (is it just for your own reading, or do you plan to distribute it).

A couple of open source tools that might help you are k2pdfopt and ScanTailor. They can both clean up scanned images, de-skew them, and crop them. k2pdfopt can perform OCR (using the Tesseract OCR engine). I believe they both do batch processing. If you have MS Office, MS Word also will automatically OCR a scanned PDF and convert it to .docx format if that is helpful to you.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

 

Back to top