
: Re: How to remove white space from the text layer in DjVu How to remove spaces and tabs from the text layer in DjVu document for better text search using the DjVuLibre library? By removing unnecessary
First, use the djvutoxml tool to extract a text layer in XML format from the DjVu document.
At the command prompt, type:
pathdjvutoxml.exe pathbook.djvu pathbook.xml
Instead path parameter substitute your location on the disk.
Press Enter...
Then, using the regular expressions, remove the selected characters(placed between sharp brackets > <). You can use any text editor (that can do regex).
String to remove spaces:
<WORD><CHARACTER coords="d*,d*,d*,d*"> </CHARACTER></WORD>
String to remove tabs:
<WORD><CHARACTER coords="d*,d*,d*,d*">	</CHARACTER></WORD>
Regular expression can also be written in this form. This deletes everything at once:
<WORD><CHARACTER coords="([0-9,]*?)">(	| )</CHARACTER></WORD>
Original fragment:
<WORD coords="318,262,706,190">Hallo</WORD>
<WORD><CHARACTER coords="707,262,760,190"> </CHARACTER></WORD>
<WORD coords="761,262,813,190">World!</WORD>
<WORD><CHARACTER coords="814,262,860,190"> </CHARACTER></WORD>
Fixed fragment:
<WORD coords="318,262,706,190">Hallo</WORD>
[here was the code for the space]
<WORD coords="761,262,813,190">World!</WORD>
[here was the code for the space]
(You can see how large text strings have been removed. Here 62 characters to describe one space!)
Finally, use the djvuxmlparser tool to merge modified XML with DjVu document.
At the command prompt, type:
pathdjvuxmlparser.exe -o pathfinal.djvu pathbook.xml
Instead path parameter substitute your location on the disk.
Parameter -o defines the target file.
Press Enter...
Free books android app tbrJar TBR JAR Read Free books online gutenberg
More posts by @Carla

: How insert the outline (the bookmarks) into DjVu How insert the outline (the bookmarks) into DjVu file for easier browsing in a document using the DjVuLibre library? With such a structure: "TOC"

: How to put downloaded Kindle e-books on to a Kindle device that cannot access the internet I have a Kindle device, but it cannot connect to the internet because the only internet I can access