: Re: Different size pages in DJVU book I have a djvu book that has different sized pages. How can I solve this by making all pages to same size. I am familiar with Python programming and I'm ready

0 Reactions

As all pages are having the same or very similar width and height, this seems a "simple" problem of some pages having the wrong resolution. Most pages have metadata that specifies 600 DPI others only 96 DPI. That later are then of course displayed much larger.

My Linux distribution comes with djvutoxml and the corresponding djvuxmlparser (from package djuvlibre-bin) which can extract the metadata, resp. merge the metadata back in. Those should be available for Windows as well (http://djvu.sourceforge.net/, make sure the executables are in your PATH) That metadata includes the DPI information from the file. Actual changing of the XML is fast, but extracting and merging takes a long (several minutes) time.

Make sure you have a copy of your book, in case the merging breaks something, and run python program.py book.djvu on this program.py:

import sys
import subprocess

book = sys.argv[1]
xml_in = 'in.xml'
xml_out = 'out.xml'

print('extracting XML')
subprocess.check_output(['djvutoxml', book, xml_in])

print('converting XML')
with open(xml_in) as inf:
with open(xml_out, 'w') as outf:
for line in inf:
if line.startswith('<PARAM name="DPI" value="96" />'):
line = line.replace('96', '600')
outf.write(line)
print('merging XML')
subprocess.check_output(['djvuxmlparser', '-o', book, xml_out])

print('done')

In general I am against parsing XML without a real parser, but you don't need regex or anything that easily breaks to get this information fixed.

The intermediate XML (two files) has the same order of size as the DjVu file itself, although the XML has no image information, it is just inefficient. Make sure you have enough room (and run this program on a fast/local drive)

There are 367 incorrect pages out of 1201, you might be able to speed up the process by only including the incorrect pages in the output XML, but then you should use an XML parser. If this is a one off conversion, I would not bother with such an optimisation.

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (0)