56

Is Ghostscript the best option if you want to optimize a PDF file and reduce the file size?

I need to store alot of PDF files and therefore I need to optimize and reduce the file size as much as possible

Does anyone have any experience with Ghostscript and/or other?

command line

exec('gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4
-dPDFSETTINGS=/screen -sOutputFile='.$file_new.' '.$file);
Community
  • 1
  • 1
clarkk
  • 24,753
  • 63
  • 173
  • 296

7 Answers7

87

If you looking for a Free (as in 'libre') Software, Ghostscript is surely your best choice. However, it is not always easy to use -- some of its (very powerful) processing options are not easy to find documented.

Have a look at this answer, which explains how to execute a more detailed control over image resolution downsampling than what the generic -dPDFSETTINGS=/screen does (that defines a few overall defaults, which you may want to override):

Basically, it tells you how to make Ghostscript downsample all images to a resolution of 72dpi (this value is what -dPDFSETTINGS=/screen uses -- you may want to go even lower):

-dDownsampleColorImages=true \
-dDownsampleGrayImages=true \
-dDownsampleMonoImages=true \
-dColorImageResolution=72 \
-dGrayImageResolution=72 \
-dMonoImageResolution=72 \

If you want to try if Ghostscript is able to also 'un-embed' the fonts used (sometimes it works, sometimes not -- depending on the complexity of the embedded font, and also on the font type used), you can try to add the following to your gs command:

gs \
  -o output.pdf \
   [...other options...] \
  -dEmbedAllFonts=false \
  -dSubsetFonts=true \
  -dConvertCMYKImagesToRGB=true \
  -dCompressFonts=true \
  -c ".setpdfwrite <</AlwaysEmbed [ ]>> setdistillerparams" \
  -c ".setpdfwrite <</NeverEmbed [/Courier /Courier-Bold /Courier-Oblique /Courier-BoldOblique /Helvetica /Helvetica-Bold /Helvetica-Oblique /Helvetica-BoldOblique /Times-Roman /Times-Bold /Times-Italic /Times-BoldItalic /Symbol /ZapfDingbats /Arial]>> setdistillerparams" \
  -f input.pdf

Note: Be aware that downsampling image resolution will surely reduce quality (irreversibly), and dis-embedding fonts will make it difficult or impossible to display and print the PDFs unless the same fonts are installed on the machine....


Update

One option which I had overlooked in my original answer is to add

-dDetectDuplicateImages=true

to the command line. This parameter leads Ghostscript to try and detect any images which are embedded in the PDF multiple times. This can happen if you use an image as a logo or page background, and if the PDF-generating software is not optimized for this situation. This used to be the case with older versions of OpenOffice/LibreOffice (I tested the latest release of LibreOffice, v4.3.5.2, and it does no longer do such stupid things).

It also happens if you concatenate PDF files with the help of pdftk. To show you the effect, and how you can discover it, let's look at a sample PDF file:

pdfinfo p1.pdf

 Producer:       libtiff / tiff2pdf - 20120922
 CreationDate:   Tue Jan  6 19:36:34 2015
 ModDate:        Tue Jan  6 19:36:34 2015
 Tagged:         no
 UserProperties: no
 Suspects:       no
 Form:           none
 JavaScript:     no
 Pages:          1
 Encrypted:      no
 Page size:      595 x 842 pts (A4)
 Page rot:       0
 File size:      20983 bytes
 Optimized:      no
 PDF version:    1.1

Recent versions of Poppler's pdfimages utility have added support for a -list parameter, which can list all images included in a PDF file:

pdfimages -list p1.pdf

 page num  type width height color comp bpc  enc interp objectID x-ppi y-ppi size ratio
 --------------------------------------------------------------------------------------
    1   0 image    423   600   rgb    3   8 jpeg     no     7  0    52    52 19.2K 2.6%

This sample PDF is a 1-page document, containing an image, which is compressed with JPEG-compression, has a width of 423 pixels and a height of 600 pixels and renders at a resolution of 52 PPI on the page.

If we concatenate 3 copies of this file with the help of pdftk like so:

pdftk p1.pdf p1.pdf p1.pdf cat output p3.pdf

then the result shows these image properties via pdfimages -list:

pdfimages -list p3.pdf

 page num  type width height color comp bpc  enc interp objectID x-ppi y-ppi size ratio
 --------------------------------------------------------------------------------------
    1   0 image   423    600   rgb    3   8 jpeg     no     4  0    52    52 19.2K 2.6%
    2   1 image   423    600   rgb    3   8 jpeg     no     8  0    52    52 19.2K 2.6%
    3   2 image   423    600   rgb    3   8 jpeg     no    12  0    52    52 19.2K 2.6%

This shows that there are 3 identical PDF objects (with the IDs 4, 8 and 12) which are embedded in p3.pdf now. p3.pdf consists of 3 pages:

pdfinfo p3.pdf | grep Pages:

 Pages:          3

Optimize PDF by replacing duplicate images with references

Now we can apply the above mentioned optimization with the help of Ghostscript

 gs -o p3-optim.pdf -sDEVICE=pdfwrite -dDetectDuplicateImages=true p3.pdf

Checking:

 pdfimages -list p3-optim.pdf

 page num  type width height color comp bpc  enc interp objectID x-ppi y-ppi size ratio
 --------------------------------------------------------------------------------------
    1   0 image   423    600   rgb    3   8 jpeg     no    10  0    52    52 19.2K 2.6%
    2   1 image   423    600   rgb    3   8 jpeg     no    10  0    52    52 19.2K 2.6%
    3   2 image   423    600   rgb    3   8 jpeg     no    10  0    52    52 19.2K 2.6%

There is still one image listed per page -- but the PDF object ID is always the same now: 10.

 ls -ltrh p1.pdf p3.pdf p3-optim.pdf

   -rw-r--r--@ 1 kp  staff    20K Jan  6 19:36 p1.pdf
   -rw-r--r--  1 kp  staff    60K Jan  6 19:37 p3.pdf
   -rw-r--r--  1 kp  staff    16K Jan  6 19:40 p3-optim.pdf

As you can see, the "dumb" concatentation made with pdftk increased the original file size to three times the original one. The optimization by Ghostscript brought it down by a considerable amount.

The most recent versions of Ghostscript may even apply the -dDetectDuplicateImages by default. (AFAIR, v9.02, which introduced it for the first time, didn't use it by default.)

Community
  • 1
  • 1
Kurt Pfeifle
  • 78,224
  • 20
  • 220
  • 319
  • thanks for the answer :) have tested it, but when you set the dpi to `72` manually the quality is lower when the setting `/screen` is set and the file size is still lower with `/screen` :) – clarkk May 04 '12 at 17:29
  • what I meant was.. The quality is both better with `/screen` and the file size is lower compard to manually setting the dpi to `72` – clarkk May 04 '12 at 17:37
  • @clarkk: I'd be interested to see a sample PDF which shows this happening. Can you provide one (or is this invading someone's privacy)? – Kurt Pfeifle May 04 '12 at 17:40
  • here http://www.dynaccount.com/tmp/35.pdf and here http://www.dynaccount.com/tmp/36.pdf.. Look at the logo in the top of the document.. 35.pdf (44.81kb - manually dpi) and 36.pdf (44.73kb - /screen) – clarkk May 05 '12 at 08:50
  • @clarkk: To make sure I do understand -- these two files are the results of the two conversion commands? (I was interested in one of your original PDFs so I could play with the conversion parameters myself....) – Kurt Pfeifle May 06 '12 at 05:41
  • 3
    For the sake of completeness, a list of options that can be used for converting PDFs with GhostScript/ps2pdf is available here: http://ghostscript.com/doc/current/Ps2pdf.htm – Simon A. Eugster May 23 '12 at 06:40
34

You can obtain good results by converting from PDF to Postscript, then back to PDF using

pdf2ps file.pdf file.ps
ps2pdf -dPDFSETTINGS=/ebook file.ps file-optimized.pdf

The value of argument -dPDFSETTINGS defines the quality of the images in the resulting PDF. Options are, from low to high quality: /screen, /default, /ebook, /printer, /prepress, see http://milan.kupcevic.net/ghostscript-ps-pdf/ for a reference.

The Postscript file can become quite large, but the results are worth it. I went from a 60 MB PDF to a 140 MB Postscript file, but ended up with a 1.1 MB optimized PDF.

likeitlikeit
  • 5,317
  • 4
  • 33
  • 53
Martijn de Milliano
  • 3,572
  • 3
  • 31
  • 43
  • It would be awesome to get some help with how to do this in a Windows environment... – Serj Sagan Sep 25 '14 at 17:54
  • Any reason why this would result in smaller files than just using `gs` with suitable settings? In addition, doing this will result in some issues caused by Postscript missing some features (e.g. alpha transparency, gradients, ICC profiles). – Mikko Rantalainen Aug 13 '15 at 12:05
  • I don't know, just reporting what worked well in my case hoping that others might also benefit from it. Feel free to post a better solution or help improving the existing ones. – Martijn de Milliano Aug 14 '15 at 21:18
  • 2
    The first step is unnecessary. ps2pdf will accept pdf input files. – frabjous Oct 01 '15 at 14:08
  • @frabjous Converting to ps first makes a huge difference for me. This is with version 9.26 of ps2pdf and pdf2ps – ariddell Mar 05 '19 at 13:26
  • @ariddell For me using ps2pdf directly resulted in the smaller file. So it seems it might be worth to try both (or understand the rule behind it). – dalanicolai Jun 05 '20 at 08:44
9

I use Ghostscript with following options taken from here.

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen \
 -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf
Jakuje
  • 20,643
  • 11
  • 53
  • 62
Primoz Rome
  • 8,932
  • 15
  • 65
  • 93
7

You may find that pdftocairo (from Poppler) can make smaller PDFs but beware that it will strip some features (such as hyperlinks) away.

Anon
  • 71
  • 1
  • 1
  • Thanks, I found that ps2pdf14 sometimes changes the output, and in this case, pdftocairo made the PDF smaller (500K to 110K) but cropped, so I added explicit margin in Inkscape before saving as PDF, **then** ran it through `pdftocairo` and **then** through `pdfcrop` (from Teχ) shrinking it to 90K. – mirabilos Mar 02 '16 at 17:10
4

You will lose in quality but if it's not an issue then ImageMagick's convert may proves helpful :

convert original.pdf reduced.pdf

Note that it doesn't always work : I once converted a 126 MB file into a 14 MB one using this command, but another time it doubled the size of a 350 Ko file.

Anyway it's worth giving it a try…

As mentioned in comments, of course there is no point in applying this command on a vector-based PDF, it will only be useful on rasterized images.

See also this post for related options.

Skippy le Grand Gourou
  • 4,607
  • 2
  • 38
  • 60
  • 3
    This only make sense for PDF files based on scanned images, otherwise ImageMagick will convert your vector-based PDF into a raster image, and the resulting file might actually be bigger that the original. – yms Jun 01 '15 at 13:25
  • @yms : I guess you're right about vector-based PDFs of course, but I believe it does make sense for any kind of raster images, of which scanned images are only a small subset. In my case the document was made from plain digital photographs. – Skippy le Grand Gourou Jun 01 '15 at 20:51
  • 1
    Yes, of course, I meant scanned images as the most common use-case of PDF files with just raster images (and maybe some transparent text from OCR) inside. I just wanted to add that comment as a remark for anyone wanting to use your solution. – yms Jun 01 '15 at 20:55
3

Ghostscript comes with two useful utilities: pdfopt and ps2pdf14. Both can be used to optimise PDF file(s) but on some occasions size of "optimised" file may be bigger than original.

Onlyjob
  • 5,052
  • 1
  • 30
  • 34
  • 1
    `ps2pdf14 input.pdf output.pdf` did the same as `gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf`. For pure text content the ouput.pdf is 25% of the size the input.file – code_angel Feb 17 '15 at 16:02
  • 1
    `pdfopt` produced bigger output – code_angel Feb 17 '15 at 16:06
  • 4
    pdfopt no longer comes with ghostscript – frabjous Oct 01 '15 at 14:09
3

This worked for me

Convert your PDF to PS (this creates a large file

pdf2ps large.pdf very_large.ps

Convert the new PS back to a PDF

ps2pdf very_large.ps small.pdf

Source: https://pandemoniumillusion.wordpress.com/2008/05/07/compress-a-pdf-with-pdftk/

Lukas Hillebrand
  • 341
  • 3
  • 15