23

I need to refactor some reports (generated with Jasper) using MS Reporting Services. Copies of original reports are available in PDF. The requirement is to make the new reports "pixel perfect" which is very cumbersome...

For making life easier I would like to have a tool which overlays the original and generated report PDFs in order to measure if they are pixel perfect or not.

Is such a tool out there?

Kurt Pfeifle
  • 78,224
  • 20
  • 220
  • 319
Oliver Vogel
  • 1,958
  • 1
  • 20
  • 32
  • Interesting question. Not sure how much this has to do with reporting, SSRS or Jasper. You may get better response by replacing those tags with others, and updating the title to reflect that you want to do a "visual PDF diff". – Jeroen Sep 17 '12 at 13:55
  • 1
    Who came up with that as a requirement, and what was the business justification? Extract each page as a bitmap compare by pixel. Are they black and white? – Tony Hopkinson Sep 17 '12 at 13:55
  • See [Tool to compare large numbers of PDF files?](http://stackoverflow.com/q/145657) – Martin Schröder Sep 17 '12 at 21:45
  • 6
    Arrggghhh... The question Martin Schröder linked to has been 'closed as not constructive' !! -- ***I really hate it when mods who don't seem to have a personal clue or interest about the topics/tags in question (as can be seen from their respective reputation scores, their expertise is elsewhere) close down community stuff which has (a) good content, (b) lots of upvotes, (c) multiple answers and (d) above-average views.*** – Kurt Pfeifle Sep 18 '12 at 21:39
  • I too need to compare pdf files - I have come up with a jar using apache pdfbox. Check this http://www.testautomationguru.com/introducing-pdfutil-to-compare-pdf-files-extract-resources/ for example & download. – vins Jun 14 '15 at 00:12

4 Answers4

26

The most simple, immediately available method to do this: use ImageMagick's compare (which is also available on Windows/Linux/Mac and other).

It can even compare PDF pages (though it uses Ghostscript as its delegate to render the PDF pages to pixel images first):

 compare.exe         ^
    tested.pdf[0]    ^
    reference.pdf[0] ^
   -compose src      ^
    delta.pdf

The resulting delta.pdf will depict each pixel as red which has a different color between the two compared PDF pages. All identical pixels will be purely white. The [0] tell compare to use the first pages of each file for comparison (page count is zero-based).

You can see how this works out with the following example:

 compare.exe                      ^
    http://qtrac.eu/boson1.pdf[1] ^
    http://qtrac.eu/boson2.pdf[1] ^
   -compose src                   ^
    delta.pdf

Here are the respective pages (converted to scaled-down PNGs for web display). The reference page is on the left, the modified page is the middle one, the 'delta-pixel-are-red' image is on the right:

first page second page delta image

A slightly different visual result you can get by skipping the -compose src parameter. Then you'll get the original file's pixels as a gray-shaded background (for context) with the delta pixels in red:

 compare.exe                      ^
    http://qtrac.eu/boson1.pdf[1] ^
    http://qtrac.eu/boson2.pdf[1] ^
    delta.pdf

first page second page delta.pdf

If you don't like the red color for pixel differences, use -highlight-color:

 compare.exe                      ^
    http://qtrac.eu/boson1.pdf[1] ^
    http://qtrac.eu/boson2.pdf[1] ^
   -highlight-color green         ^
    delta.pdf

The default resolution used to render the PDF pages is 72 dpi. Should you need a higher precision, you can switch to 300 dpi using the -density parameter like this:

 compare.exe                      ^
   -density 300                   ^
    http://qtrac.eu/boson1.pdf[1] ^
    http://qtrac.eu/boson2.pdf[1] ^
    delta.pdf

Note, switching to higher densities will slow down the process and create bigger files.

You can even create a *.txt file for the delta image which describes each pixel's coordinates and the respective color values:

 compare                          ^
    http://qtrac.eu/boson1.pdf[1] ^
    http://qtrac.eu/boson2.pdf[1] ^
   -compose src                   ^
   -highlight-color black         ^
    delta.txt

Then simply count the number of total vs. black pixels (sorry, this is Unix/Linux/MacOSX syntax):

 total_pixels=$(( $(cat delta.txt | wc -l) - 1))
 black_pixels=$(( $(grep black delta.txt | wc -l) -1 ))

In the example used for the illustrations above, I get

 total_pixels=500990
 black_pixels=8727

Of course the 'ideal' result would be

 black_pixels=0
sidon
  • 1,344
  • 1
  • 15
  • 30
Kurt Pfeifle
  • 78,224
  • 20
  • 220
  • 319
  • Here's a script to visually diff two PDFs page-by-page using ImageMagick and Poppler tools (for speed): https://gist.github.com/brechtm/891de9f72516c1b2cbc1. It outputs one JPG for each page of the PDFs in a `pdfdiff` directory and additionally prints the numbers of the pages which differ between the two PDFs. – Brecht Machiels Mar 31 '16 at 13:36
  • This appears not to work anymore? Or possibly requires ImageMagick v7+. v6.9.6-5 complains: `unrecognized image type 'pdf'`. – Monkpit Nov 18 '16 at 19:38
  • 1
    @Monkpit: Of course it works. Your ImageMagick is not configured to consume PDF as an input formt. You need to have installed and configured Ghostscript as a *delegate* on behalf of ImageMagick to process PDF input into a raster image and hand it over to IM. Usually that is already the case for most installations out of the box. For more details see also *[this series of answers](http://stackoverflow.com/search?q=user%3A359307+%5Bimagemagick%5D+delegate+pdf)*. – Kurt Pfeifle Nov 18 '16 at 21:01
4

diffpdf allows you to compare two PDFs side by side.

Martin Schröder
  • 3,086
  • 3
  • 38
  • 69
  • 1
    This is what easily worked for me. Unlike `i-net PDFC` and `compare` from ImageMagick. No idea why `compare` returned no difference at all while difference is not insignificant. – akostadinov Mar 09 '19 at 07:06
4

This question has already an accepted answer, but I'd like to give my two cent. We made i-net PDFC that perfectly matches your scenario. It has been made to check reports made with another reporting tool match the output of our reporting software. But its even more powerful. What PDFC does not do is: check image-based pixel perfectness, but it checks, with certain settings, that a document is basically (and visually) the same based on its content. Way more powerful than pure pixel-based comparison.

i-net PDFC can operate visually or command-line based (e.g. to batch process) and works with continuos integration systems. The visual component even allows semitransparent overlaying the two PDF files to have the user check the pixel-perfectnes.

The software is fresh out of beta. Give it a try and let us know what you think. (Yep. I work for the company who made this.)

gamma
  • 1,892
  • 1
  • 20
  • 40
2

I recommend printing the reports with PDFCreator as an PNG Image, you can then use a graphics program like Paint .NET to make the background transparent and layer both reports on top of each other.

Using some color transformation on one or both of the images (e.g. color one in red, the other in blue) it should show you to see the differences very good.

You find PDFCreator here http://de.pdfforge.org/pdfcreator. It's completely free to use.

aKzenT
  • 7,403
  • 2
  • 33
  • 62