bell notificationshomepageloginedit profileclubsdmBox

10.03% popularity   0 Reactions

DjVu is efficient at compression, but sometimes the text becomes an image and non-selectable. Is there anyway to make sure that the text remains selectable? Which program should I use for this conversion?


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (3)

Login to follow story

More posts by @Karen

3 Comments

Sorted by latest first Latest Oldest Best

 

@Carla

10% popularity   0 Reactions

You can use djvuxmlparser to insert XML with OCR text.

You can prepare this XML for example in FineReader.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

10% popularity   0 Reactions

You can also use ocrodjvu with a selected OCR engine, e.g. tesseract.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

10% popularity   0 Reactions

DjVu files are normally image only. From these file sections can be selected as images but not as text¹.

If OCR was applied during, or after, the conversion to DjVu. Extra information is stored in the files that associate image areas with text. Only if that was done you can select text from such a file².

Applying OCR to a DjVu file can be done online. If you don't like that you could try this script, that uses Tesseract. Or you can go for commercial software such as Document Express.

¹ e.g. using the djview program.
² In theory one could do OCR on the fly in the DjVu viewer, but I don't think any of the currently available viewers can do that.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

 

Back to top