
: How to remove white space from the text layer in DjVu How to remove spaces and tabs from the text layer in DjVu document for better text search using the DjVuLibre library? By removing unnecessary
How to remove spaces and tabs from the text layer in DjVu document for better text search using the DjVuLibre library?
By removing unnecessary characters (and their xml tags), also the file size is reduced.
Free books android app tbrJar TBR JAR Read Free books online gutenberg
More posts by @Nuzhat

: Recommend a new e-book reader, main need is highlight and add notes to my books I have an e-book reader ( in fact it is a gift) Kobo aura. The problem I faced with this e-book is that

: How to create .mobi / .azw3 from latex document? Is it possible to use calibre to convert latex into .mobi or .azw3?
1 Comments
Sorted by latest first Latest Oldest Best
First, use the djvutoxml tool to extract a text layer in XML format from the DjVu document.
At the command prompt, type:
pathdjvutoxml.exe pathbook.djvu pathbook.xml
Instead path parameter substitute your location on the disk.
Press Enter...
Then, using the regular expressions, remove the selected characters(placed between sharp brackets > <). You can use any text editor (that can do regex).
String to remove spaces:
<WORD><CHARACTER coords="d*,d*,d*,d*"> </CHARACTER></WORD>
String to remove tabs:
<WORD><CHARACTER coords="d*,d*,d*,d*">	</CHARACTER></WORD>
Regular expression can also be written in this form. This deletes everything at once:
<WORD><CHARACTER coords="([0-9,]*?)">(	| )</CHARACTER></WORD>
Original fragment:
<WORD coords="318,262,706,190">Hallo</WORD>
<WORD><CHARACTER coords="707,262,760,190"> </CHARACTER></WORD>
<WORD coords="761,262,813,190">World!</WORD>
<WORD><CHARACTER coords="814,262,860,190"> </CHARACTER></WORD>
Fixed fragment:
<WORD coords="318,262,706,190">Hallo</WORD>
[here was the code for the space]
<WORD coords="761,262,813,190">World!</WORD>
[here was the code for the space]
(You can see how large text strings have been removed. Here 62 characters to describe one space!)
Finally, use the djvuxmlparser tool to merge modified XML with DjVu document.
At the command prompt, type:
pathdjvuxmlparser.exe -o pathfinal.djvu pathbook.xml
Instead path parameter substitute your location on the disk.
Parameter -o defines the target file.
Press Enter...
Free books android app tbrJar TBR JAR Read Free books online gutenberg