bell notificationshomepageloginedit profileclubsdmBox

10.01% popularity   0 Reactions

Is it possible to know if a given PDF is digital native or has gone through some kind OCR or digitization process.

If yes, is it possible to do so programmatically?


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (1)

Login to follow story

More posts by @Debbie

1 Comments

Sorted by latest first Latest Oldest Best

10% popularity   0 Reactions

PDF documents embed all its fonts within the document. You can get these embedded fonts information programmatically with most PDF libraries.

If there are no embedded fonts then the PDF at hand is a scanned one.
If there are only one or two fonts, then the document is OCRized.
If there are three or more fonts, then the document is a digital native.

Potentially, there can be digital native PDF with one or two fonts only but luckily for me this is an acceptable error rate.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

 

Back to top