bell notificationshomepageloginedit profileclubsdmBox

10% popularity   0 Reactions

I've been converting PDF to other formats for over 3 years. I've converted at least 20 books.

A PDF file is not easily convertible, except for text paragraphs. I have a lot of experience extracting text from PDF. PDF is an endpoint, not meant for further processing or extracting of text. You will find you have a LOT of problems extracting images and tables. Images generally don't extract at all. Tables will be really messed up and converting to other formats will require a LOT of manual cleanup.

That said, if you want to extract the text I found this site to be better than others: online-convert.com. It supports many input and output formats. And I tried at least 5 other sites and 5 other OCR programs for the PC.

You will not get the accuracy you want from any program as far as I know. There might be better programs, like Abbyy Finereader but they are not free and there is no such thing as perfect conversion from PDF to anything else, especially for images and tabular data.

Books of fiction are easy to extract the text from because they are just wrapped paragraphs with few or no images and no tabular data. So to test your conversion, find a more challenging book with tabular data and images and see how it goes.

Some PDF books have the text inside them and extract fairly well. Some PDF files are just a bunch of images, one scanned page per image. For these you need OCR.

Also, there is no OCR that is 100% accurate. You might get 90-95% accuracy. But we don't know which letters or words are inaccurate, so that means we have to check every word in the output. This takes a lot of time.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

Login to follow story

More posts by @Kelli

0 Comments

Sorted by latest first Latest Oldest Best

 

Back to top