: Re: How to OCR tables of contents to proper outputs? Usually when OCR an table of content the columns are separated by a large space, so the outputs are not properly order. For example, for an

0 Reactions

Define what is: "fix it at the first place".

If you want to fix wrong output from OCR analysis, a simple solution on an infinite set of TOCs you will never make.
You will never apply all variations. You would have to create a machine learning algorithm that would analyze each TOC variant.

Or count substrings of the same characteristics (in simple TOC).

Chapter number
Chapter number
Chapter number
Chapter number
Chapter number
...

= 5

Chapter title
Chapter title
Chapter title
Chapter title
Chapter title
...

= 5

If you want to fix OCR analysis, it's a good to answer:
What OCR tool do you use?

For example, in Tesseract you can set, that text is processed by rows instead of columns.

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (0)

: CSS3-break will break PDF? In a digital preservation perspective Is there any reason not to use EPUB after CSS3-break standard is incorporated to browsers? Or will this not be a barrier?

@Megan

Posted in: #Epub #PageBreak

0 Reactions

Load Full 1 Comments

: How can I get free Microsoft word templates for poetry books? I am designing a poetry ebook with Microsoft word. Where can I find free templates that I can use? All the free templates I see

@Megan

Posted in: #EbookManagement #Editing

0 Reactions

Load Full 1 Comments

: SD Cards with Unusual File Types Background: My dad died last year and I inherited sd cards (not micro) in plastic cases. These cards have several directories that all end in .RES Each directory

@Megan

Posted in: #FileFormat

0 Reactions

Load Full 1 Comments

: How to convert markdown with sub-markdowns to ebook main.md contain sub-mds: main.md 1. [1 - ](1.md) 1. [2 - ](2.md) 1. [3 - ](3.md) 1. [4 - ](4.md) 1. [5 - ](5.md) 1. [6 - ](6.md) I use calibre

: Re: How to OCR tables of contents to proper outputs? Usually when OCR an table of content the columns are separated by a large space, so the outputs are not properly order. For example, for an

More posts by @Megan

: CSS3-break will break PDF? In a digital preservation perspective Is there any reason not to use EPUB after CSS3-break standard is incorporated to browsers? Or will this not be a barrier?

: How can I get free Microsoft word templates for poetry books? I am designing a poetry ebook with Microsoft word. Where can I find free templates that I can use? All the free templates I see

: SD Cards with Unusual File Types Background: My dad died last year and I inherited sd cards (not micro) in plastic cases. These cards have several directories that all end in .RES Each directory

: How to convert markdown with sub-markdowns to ebook main.md contain sub-mds: main.md 1. [1 - ](1.md) 1. [2 - ](2.md) 1. [3 - ](3.md) 1. [4 - ](4.md) 1. [5 - ](5.md) 1. [6 - ](6.md) I use calibre

Login to post a comment!

0 Comments

Trending!

Back to top