I have some documents that I have digitalized with a Xerox scanner to a PDF file. Using Java, I am trying to extract RGB pixel data from it, to use in image recognition applications. Developing this from scratch is a little bit beyond my level, so I am relying on 3rd party libraries for the PDF prosessing.
So far I have tried 2 different libraries; PdfBox and PdfClown.
With PdfBox, I am trying to use the convertToImage()
method to obtain a BufferedImage
. With PdfClown I am trying to use the render(page,size)
method from the Renderer
class to obtain a BufferedImage
. In both cases the returned image is blank. All pixels are white [(r,g,b) = (255,255,255)
].
I have been able to get non-blank BufferedImage's from other pdf documents that dont originate from a scan, so I am suspecting that the problem is with the format of the scanned document.
Here is a sample PFD file: http://www.filedropper.com/innlevering1
Does anyone know how to solve this? Or can you offer a different approach?