: How to OCR tables of contents to proper outputs? Usually when OCR an table of content the columns are separated by a large space, so the outputs are not properly order. For example, for an

0 Reactions

Usually when OCR an table of content the columns are separated by a large space, so the outputs are not properly order. For example, for an table like this:

The output would be:

The Rank Function
Permutations of Atoms
Pure Set Theory and Axiom System ZF
3.5
3.6
3.7

I'd like it to be:

3.5 The Rank Function112
3.6 Permutations of Atoms116
3.7 Pure Set Theory and Axiom System ZF118

But different TOCs has different the output patterns, so there is no way to build a regex script to automatically fix every book. The best approach is to fix it at the first place. But how?

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (2)

: Can't find Kindle book after download from amazon I have just purchased a Kindle book from Amazon and told it to download it to my PC. I can open it and read it offline so it must have

@Tiffany

Posted in: #Amazon #SendToKindle

0 Reactions

Load Full 2 Comments

: Calibre fails to unwrap Georgian text I am trying to use the calibre conversion feature to convert ebook written in Georgian from PDF to Mobi. The effect is however very poor. Someone in 10

@Tiffany

Posted in: #Calibre #Conversion #Mobi #Pdf

0 Reactions

Load Full 2 Comments

: Best PDF app reader for iPad that emulates the paper book experience Good day. I've used before PDF Expert by Readdle. But I really dislike seeing the gray spaces where the PDF ends and

@Tiffany

Posted in: #Ios #Pdf

0 Reactions

Load Full 1 Comments

: After trying several apps on windows to read aloud epub books without success. I settled for ReadAloud and it does the job excently. https://www.microsoft.com/en-us/p/readaloud/9wzdncrdn3ms?activetab=pivot:overviewtab

@Tiffany

0 Reactions

Load Full 0 Comments

Login to post a comment!

2 Comments

Sorted by latest first Latest Oldest Best

@Megan

0 Reactions

Define what is: "fix it at the first place".

If you want to fix wrong output from OCR analysis, a simple solution on an infinite set of TOCs you will never make.
You will never apply all variations. You would have to create a machine learning algorithm that would analyze each TOC variant.

Or count substrings of the same characteristics (in simple TOC).

Chapter number
Chapter number
Chapter number
Chapter number
Chapter number
...

= 5

Chapter title
Chapter title
Chapter title
Chapter title
Chapter title
...

= 5

If you want to fix OCR analysis, it's a good to answer:
What OCR tool do you use?

For example, in Tesseract you can set, that text is processed by rows instead of columns.

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (0)

@Bryan

0 Reactions

Not really answer the question, but some books in Google Books have TOC:

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (0)

: How to OCR tables of contents to proper outputs? Usually when OCR an table of content the columns are separated by a large space, so the outputs are not properly order. For example, for an

More posts by @Tiffany

: Can't find Kindle book after download from amazon I have just purchased a Kindle book from Amazon and told it to download it to my PC. I can open it and read it offline so it must have

: Calibre fails to unwrap Georgian text I am trying to use the calibre conversion feature to convert ebook written in Georgian from PDF to Mobi. The effect is however very poor. Someone in 10

: Best PDF app reader for iPad that emulates the paper book experience Good day. I've used before PDF Expert by Readdle. But I really dislike seeing the gray spaces where the PDF ends and

: After trying several apps on windows to read aloud epub books without success. I settled for ReadAloud and it does the job excently. https://www.microsoft.com/en-us/p/readaloud/9wzdncrdn3ms?activetab=pivot:overviewtab

Login to post a comment!

2 Comments

Trending!

Back to top