0

I have a situation where I want to convert user supplied single page PDF's to black and white bitmaps in a suitable high resolution for further processing (eventually ending in a proprietary printing solution). All this must run in headless mode.

Due to policital and technical reasons this must be a pure Java library (i.e. no Ghostscript wrapper), and at this point in time we are interested in a royalty-free open source solution but where performance is not very important. If this project is successful we might need an upgrade path to a more performant proprietary library, but not now.

I have had a look around, and found that most PDF-library projects focus on either manipulating or viewing PDF's, but not as much on using it as a render engine - which is the only thing I need - and at least one engine has deliberately crippled the font engine in the Open Source version compared to the commercial version.

Hence, I need a recommendation for a PDF-library:

  • Render input files to bitmaps in headless mode.
  • All Java, no native code.
  • Renders all PDF-files commonly found in the wild (except invalid or incorrectly formatted ones)
  • is Open Source with a business friendly license.
  • is robust
  • is actively maintained
  • may be slow or not able to handle more than a few pages (more pages being a limitation lifted in the commercial version)

Suggestions?

Thorbjørn Ravn Andersen
  • 68,906
  • 28
  • 171
  • 323
  • Please do not just mention projects, that can go in comments. – Thorbjørn Ravn Andersen Mar 12 '11 at 18:04
  • @Thorbjørn Ravn Andersen - I dwelled on that myself too and removed my post. T – CoolBeans Mar 12 '11 at 18:23
  • One option is [apache PDFBox][http://pdfbox.apache.org/] and example [here][http://kickjava.com/src/org/pdfbox/PDFToImage.java.htm]. – CoolBeans Mar 12 '11 at 18:24
  • Do you have actual experience with the packages you mention, and know first hand that they can do what I need? – Thorbjørn Ravn Andersen Mar 12 '11 at 18:31
  • I have used it for text extraction only and it worked fine for my purposes. But unfortunately not an expert on its inner details by any means. – CoolBeans Mar 12 '11 at 19:06
  • The only open-source, pure-Java, solution that I know of is [pdf-renderer](http://java.net/projects/pdf-renderer). [Apparently](http://www.javaworld.com/javaworld/jw-06-2008/jw-06-opensourcejava-pdf-renderer.html), this project was started at Sun and was passed to the Swing Labs team. – Uriah Carpenter Mar 12 '11 at 21:05
  • When I answered you didn't specify, "no native code" at the time just that the API be Java, ie: JOGL is an example where you'd expect there to be driver requirements but the API is 100% Java. Also the third paragraph makes it sound like you need, editing, viewing, and rendering as you are searching for conversion in such libraries when rendering is all you need. – Quaternion Mar 12 '11 at 21:31
  • @Quaternion, I took the opportunity to reiterate "Due to policital and technical reasons this must be a pure Java library" in the list. – Thorbjørn Ravn Andersen Mar 12 '11 at 21:41
  • @Thorbjørn, political... I'd recommend a new election but it's probably a dictatorship. Your question is now very clear but "Renders all PDF-files commonly found in the wild" should probably be dropped. I have a number of PDF files that acrobat viewer will not even render but my favorite PDF viewer takes no issue (not Java). For that condition to be met with good faith you would need to have the code used in popular pdf viewing software. If a survey of Java PDF viewers does not find a robust candidate it's hardly reasonable any API will do better. – Quaternion Mar 12 '11 at 22:03
  • @Quaternion, are these PDF-files valid? If so, why can't they be rendered by Adobe? – Thorbjørn Ravn Andersen Mar 13 '11 at 09:55

5 Answers5

2

There is no such library. Java libraries able to do correct rendering of embedded fonts are all commercial (I had to do an exhaustive search for a similar problem half a year ago).

I don't know the exact reasons, but believe that doing true type rendering of embedded fonts might somehow be protected due to licensing from adobe, which holds some patents on TrueType. At best it is just very hard to implement, so everyone who went through this wants some money for it. I have choosen PDFOne, because they are very cheap (~400$ for a single seat redist license), and relatively good. They still have problems with some encodings, but work for us.

I wouldn't go with java here anyway, but prefer to use ghostscript for its speed and robustness. But beware that if you don't use the library "on arms length", you will violate the GPL it is released in.

Daniel
  • 25,883
  • 17
  • 87
  • 130
  • 1
    The Truetype font patents are no longer an issue. See http://www.jpedal.org/PDFblog/2010/08/why-the-truetype-hinting-patent-expiration-matters/ – mark stephens Mar 13 '11 at 20:32
1

Your list is contradictory. The 'less limited' the license the less revenue there is to fun support and development. Which is more important to you? You can use Multivalent or PDFRenderer (which are very free but not supported) or IText, Icepdf or JPedal which have Open Source and commercial versions but actively developed because they have revenue streams.

mark stephens
  • 3,153
  • 14
  • 19
  • As I wrote I don't mind a commercial upgrade path to get e.g. faster prints or similar for high volume customers, but the entry point I'm looking for now, must work well too for low volumes. – Thorbjørn Ravn Andersen Mar 13 '11 at 16:11
0

Have you considered iText?

IBorysov
  • 26
  • 4
0

Apache PDFBOX meets all criteria. https://pdfbox.apache.org/

The PDF to Image conversion is documented here: https://stackoverflow.com/a/23327024/3196753

tresf
  • 3,691
  • 3
  • 26
  • 78
0

have a look at icePDF and here

Community
  • 1
  • 1
OhadR
  • 6,637
  • 3
  • 39
  • 48