bell notificationshomepageloginedit profileclubsdmBox

10.03% popularity   0 Reactions

PDFs on archive.org are in high resolution yet has relatively small size. For example, this book. The book's resolution is 500 ppi and 46 Mb size.

What I did was I extract the pages using Adobe Acrobat to png with ppi 500, compressed using PNGGauntlet and take 150 pages out of it and combined them using Acrobat. What I got was a 107
Mb PDF file. This is ridiculous because it's has far lesser pages yet 2.5 times bigger.

How does one make a small pdf like the original one?


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (3)

Login to follow story

More posts by @Terence

3 Comments

Sorted by latest first Latest Oldest Best

10% popularity   0 Reactions

The size of a PDF file is dependent on the content of that file. A PDF file is a bundle of streams, with mostly compressed data.

If you generate a PDF file from e.g. a Word or OpenOffice document, these file tend to be relatively small, especially if you do not include Font information and rely on the system provided fonts or font substitutions.
Adding images to your text will make the files much larger.

Since PDF is one of the minority of image file formats that support multiple images, it is often (mis-)used to store multiple images, that e.g. come from a scan. Those scans are often already compressed JPEG images and for those the PDF file works only as a container (no, or little further compression is possible). For those kinds of PDF files, the size can be very large, depending on the pixel size of the images (scan resolution x paper format) and in case of lossy compression (JPEG) the quality of the compression.

If you extract such lossy image files to a lossless format like PNG immediately blows up each of the images often by an order of magnitude. So your results are not surprising.

It would be much better to just extract the individual pages of the file into separate PDF files and recombine only the pages that you need. This can be done without having to decompress the streams containing the imagery e.g. by a program like pdftk. If you pick half of the pages of a book, you can expect to have halve the size document in the end (on average).


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

 

@Ted

10% popularity   0 Reactions

It might be the image format you chose; png files tend to run a fair bit larger than jpg files do. Might be worth trying an export to jpg instead.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

 

@Jamie

10% popularity   0 Reactions

They are small, because their content is largely just plain text (including equations).

Since they are directly generated from source files (e.g. LaTeX or Microsoft Word documents), the text is as embedded as a bunch of strings in the .pdf.

If you instead have a document and scan it to .pdf, the pdf just contains one large image per page. This is much less efficient comparing the amount of needed disk space.


Free books android app tbrJar TBR JAR Read Free books online gutenberg


Load Full (0)

 

Back to top