bell notificationshomepageloginedit profileclubsdmBox

Login to follow story

More posts by @Tiffany

2 Comments

Sorted by latest first Latest Oldest Best

10% popularity   0 Reactions

The short answer is: PDF is not a good format for storing image data. The only reason that it is often used for doing so, is that it is one of the few formats (the other more well known ones are DjVu and TIFF), that allow you to store multiple (scanned) images in a single file.

The longer answer is that the possible storage of image data in PDF files is done in a less efficient way than is possible in the way DjVu does. So in general, unless the DjVu file is compressed inefficiently to start with, you will always get a bigger PDF file, or you get a much worse quality image.

The primary difference, apart from algorithms used for compression, is that an image in a DjVu file consists of multiple layers, each compressed separately, with optimized algorithms for the layer's data (monotone, color etc.), and recombined for display/printing. Especially on a page with characters (not selectable text associated with the characters) combined with images DjVu easily gets 20x smaller files at similar lossy quality.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

10% popularity   0 Reactions

You can read about the djvu file format here (with focus on the compression here), but the main difference is how the two formats compress the data.

Here are some relevant excerpts from Wikipedia:

DjVu has been promoted as an alternative to PDF, promising smaller
files than PDF for most scanned documents.[4] The main difference
between DjVu and PDF is that DjVu is a pure raster file format while a
PDF file can contain both vector and raster graphics.

DjVu divides a single image into many different images, then
compresses them separately. To create a DjVu file, the initial image
is first separated into three images: a background image, a foreground
image, and a mask image. The background and foreground images are
typically lower-resolution color images (e.g., 100 dpi); the mask
image is a high-resolution bilevel image (e.g., 300 dpi) and is
typically where the text is stored. The background and foreground
images are then compressed using a wavelet-based compression algorithm
named IW44.[5] The mask image is compressed using a method called JB2
(similar to JBIG2). The JB2 encoding method identifies nearly
identical shapes on the page, such as multiple occurrences of a
particular character in a given font, style, and size. It compresses
the bitmap of each unique shape separately, and then encodes the
locations where each shape appears on the page. Thus, instead of
compressing a letter "e" in a given font multiple times, it compresses
the letter "e" once (as a compressed bit image) and then records every
place on the page it occurs.

As for converting a djvu to a pdf without increasing the size, there is a program that claims it can do so called Scientific and Technical Document Utility (it is not free). I have never used it myself though.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

 

Back to top