
: Is there any software that facilitates scanning of a paper book into an ebook? I have some old sci-fi novels that are out of print and the rights owners are not creating them in ebook form.
I have some old sci-fi novels that are out of print and the rights owners are not creating them in ebook form. Is there any software that facilitates scanning and conversion of these paper books into ebook format?
Free books android app tbrJar TBR JAR Read Free books online gutenberg
More posts by @Lorelei

: What settings are the most energy efficient on a Nook Color? I have seen some touch screen readers that have a gray on black background scheme that helps to reduce the power consumption and
8 Comments
Sorted by latest first Latest Oldest Best
I think Abby Fine reader is avery good software. I have used a localized version (Tamil OCR ) for land record digitization. The success is AS GOOD AS the scanning and conditions of documents. For printed text success is good.(more than 75% error free). Problem is mostly only English OCR is available. If you need languages of your choice you have to ask Abby.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
Here is a project that handles every step of the process.
There is an open source project called Homer that installs a suite of software to help with this including ScanTailor and tesseract-ocr. The final result is a searchable PDF. You can copy the text layer from the pdf (or the related html file created in the process) and paste it into an editor like Sigil to create an epub. Then, if you want mobi, convert using Calibre.
And there is a paper I wrote that explains the software in more detail.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
Scan Tailor is software that helps optimize the images resulted from scan of books and it is free.
Its features include:
Fix orientation
Split pages (very useful when you scan two pages at once and want to make a single page view ebook)
Deskew
Select content (can be used to remove the pagination and any other content that do not make sense in flowable ebook formats)
Change margins
Erase spots
If you also want to convert these books to epub or similar format, you will need a OCR software and some software like Calibre to transform the result text and images into the ebook file.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
The point of failure in all these toolchains is the OCR.
It is well worth the time spent tracking down clean, undamaged,
unfaded, non-yellowed copies with a good-quality print impression,
if this is at all possible.
If you can, get a local craft bookbinder to trim the spine off the
book with a power guillotine, and then use an autofeed scanner on
the sheets, so that every page image is exactly upright.
This maximises the chances of getting a very low error rate in character recognition. Time spent up front saves time spent later making corrections.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
Scanned texts are most efficiently stored in the DJVU format, if lossy compression is acceptable (if not, use a multi-page format like TIFF).
If you convert scans to the DJVU format with OCR recognition enabled, you can extract the OCR-ed text and use if for EPUB generation.
On Linux you can do so using djvutxt to get the text and convert that to EPUB.
A more comfortable way of extraction/conversion is using Calibre to convert the text in the DJVU file to EPUB, this works on Linux and Windows. The Linux version uses djvutxt to extract the text if available, if not it falls back to Python based extraction of the (non-standard compressed) text stream. Windows always uses the slower Python based extraction.
(This is a shameless plug for the calibre plug-in that I wrote a few years ago for exactly this purpose).
Free books android app tbrJar TBR JAR Read Free books online gutenberg
ABBY Finereader to a text file
proofread text file against images
use NoteTab Pro to HTMLize the text
create ePub structure in Oxygen, cut and paste HTML into ePub files
view with Calibre and Adobe Digital Editions
check with ePub Validator (http://validator.idpf.org/)
If you don't proofread, you're going to get scannos (equivalent of a typo, resulting from imperfect optical character recognition of a document digitized with a scanner; source).
BTW, I have many disintegrating SF novels, from the 50s onwards, that I fully intend to scan and save ... when I get a round tuit.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
Tesseract is an open source OCR engine that gives fairly good results. It's my understanding that Google use it for Google Books. OCRFeeder is a project for document layout analysis that works as a nice GUI for Tesseract.
Ocropus is another known open source OCR system.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
For commercial software you could try ABBYY FineReader or alternatives, or Omnipage or alternatives.
Both can output in Searchable PDF format, which is useful because if you use OCR on books you will never get 100% the right content without proofreading.
Free books android app tbrJar TBR JAR Read Free books online gutenberg