0

I want to get pdf first page as jpg image. The program shows me some errors:

Apr 18, 2016 1:18:40 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: BDC
Apr 18, 2016 1:18:40 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: EMC
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/fontbox/afm/AFMParser
    at org.apache.pdfbox.pdmodel.font.PDFont.addAdobeFontMetric(PDFont.java:165)
    at org.apache.pdfbox.pdmodel.font.PDFont.addAdobeFontMetric(PDFont.java:152)
    at org.apache.pdfbox.pdmodel.font.PDFont.getAdobeFontMetrics(PDFont.java:122)
    at org.apache.pdfbox.pdmodel.font.PDFont.<clinit>(PDFont.java:114)
    at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:108)
    at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:213)
    at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:607)
    at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:59)
    at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:557)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
    at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
    at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
    at org.apache.pdfbox.pdfviewer.PageDrawer.drawPage(PageDrawer.java:139)
    at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:801)
    at org.apache.pdfbox.pdmodel.PDPage.convertToImage(PDPage.java:732)
    at Main.main(Main.java:26)
Caused by: java.lang.ClassNotFoundException: org.apache.fontbox.afm.AFMParser
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 16 more

Main.java:26 : BufferedImage image = firstPage.convertToImage();

Is there any possibility to get first page as image by using PDFBox?

Full code:

try {
            String sourceDir = "/home/linux/Downloads/test.pdf";
            String destinationDir = "/home/linux/Downloads/testImage";
            File sourceFile = new File(sourceDir);
            File destinationFile = new File(destinationDir);

                PDDocument document = PDDocument.load(sourceDir);
                PDPage firstPage = (PDPage) document.getDocumentCatalog().getAllPages().get(1);

                String fileName = sourceFile.getName().replace(".pdf", "");

                    BufferedImage image = firstPage.convertToImage();
                    ImageIO.write(image , "jpg", new File(destinationDir +fileName+"_"+".jpg"));

                document.close();

        } catch (Exception e) {
                    e.printStackTrace();
}

I just need to parse first page as image.

Munchmallow
  • 311
  • 1
  • 2
  • 14
  • mostly duplicate of https://stackoverflow.com/questions/18503159/getting-java-lang-noclassdeffounderror-org-pdfbox-pdfparser , fontbox is missing, see also https://pdfbox.apache.org/1.8/dependencies.html – Tilman Hausherr Apr 18 '16 at 11:36
  • It is because of font type as I understood. But how can I remove warnings? [link](http://s4.postimg.org/eag2cufct/Warnings.jpg) – Munchmallow Apr 18 '16 at 12:33
  • That is a different question. The warning is typical for 1.8. Solution: update to 2.0. Don't forget to read the migration guide. – Tilman Hausherr Apr 18 '16 at 12:36
  • One can get even 1.8 to not show such warnings. Thus, @Munchmallow, are you bound to use a pre-2.0.0 version or not? – mkl Apr 18 '16 at 12:50
  • I am using 1.8.5. and PDFBox 1.8.10. I tried ten different pdfs, I get first page as image, and also get some warnings or info messages which are like the ones I showed them in my first comment. Should I update both fontbox and pdfbox to 2.0? – Munchmallow Apr 18 '16 at 13:43
  • @Munchmallow if you're just starting with PDFBox, i.e. not bound to 1.8, then use 2.0 and delete all older versions. You'll get better quality images. See here https://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images how to convert. – Tilman Hausherr Apr 18 '16 at 14:00
  • I have followed the link that is in your last comment. There are some font problems when converting(or parsing) to the image. e.g french words [original 1st page](http://s3.postimg.org/jpolrl8c3/original_Picture.jpg) and [parsed 1st page as jpg](http://s1.postimg.org/3xdyo011b/bad_Parsing.jpg). Also I am getting warnings for those characters which cannot be parsed. – Munchmallow Apr 18 '16 at 14:36
  • @Munchmallow did you get this image with 2.0 or with 1.8? It's only 36 minutes between the two comments so I wonder whether you used 2.0. – Tilman Hausherr Apr 18 '16 at 14:45
  • No, I used 1.8 I will be using the images on my searching engine. I am trying to make a search table like amazon's and I am using 1.8 there. I believe It will take some time to update the codes to 2.0. I now prefer using 1.8. – Munchmallow Apr 18 '16 at 14:51
  • @Munchmallow then you'll have to live with the rendering problems. Btw getting rid of the warnings, see https://stackoverflow.com/questions/311408/turning-off-hibernate-logging-console-output and use the appropriate class. – Tilman Hausherr Apr 18 '16 at 15:27
  • @Munchmallow The 1.8 rendering code has numerous deficiencies fixed in 2.0.0 but works fine for a fair portion of the PDFs out there. Furthermore, you can simply ignore most of the INFOs *unsupported/disabled operation* (they are not WARNINGs, merely INFOs!) because they refer to instructions without influence on the optical appearance of the rendered page image. – mkl Apr 18 '16 at 15:33
  • I upgraded it to 2.0. I do not want to live with rendering problems. The Stackoverflow rules say that I need to open a new topic instead of asking it here. So I have just opened a topic. [link](http://stackoverflow.com/questions/36698492/pdf-rendering-with-pdfbox-2-0-and-decrypting). I believe font problem will be solved when I start using 2.0 – Munchmallow Apr 18 '16 at 15:39
  • *I believe font problem will be solved when I start using 2.0* - As you did not share a sample PDF, that is hard to tell. But there definitively are numerous improvements. – mkl Apr 18 '16 at 15:50

1 Answers1

0

Convert a PDF file to image

Trying looking into this question, the "marked as correct answer" tells the correct way of how to approach your desire. :)

Community
  • 1
  • 1
Sh0ck
  • 97
  • 2
  • 12
  • The referenced answer neither explains about the missing resource nor does it prevent the warnings as desired by the OP here. – mkl Apr 18 '16 at 12:46
  • I have followed that one and I do see these warnings [link](http://s4.postimg.org/eag2cufct/Warnings.jpg) – Munchmallow Apr 18 '16 at 13:36