I am trying to extract image from the pdf using pdfbox. I have taken help from this post . It worked for some of the pdfs but for others/most it did not. For example, I am not able to extract the figures in this file
After doing some research I found that PDResources.getImages is deprecated. So, I am using PDResources.getXObjects(). With this, I am not able to extract any image from the PDF and instead get this message at the console:
org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm cannot be cast to org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectImage
Now I am stuck and unable to find the solution. Please assist if anyone can.
//////UPDATE AS REPLY ON COMMENTS///
I am using pdfbox-1.8.10
Here is the code:
public void getimg ()throws Exception {
try {
String sourceDir = "C:/Users/admin/Desktop/pdfbox/mypdfbox/pdfbox/inputs/Yavaa.pdf";
String destinationDir = "C:/Users/admin/Desktop/pdfbox/mypdfbox/pdfbox/outputs/";
File oldFile = new File(sourceDir);
if (oldFile.exists()){
PDDocument document = PDDocument.load(sourceDir);
List<PDPage> list = document.getDocumentCatalog().getAllPages();
String fileName = oldFile.getName().replace(".pdf", "_cover");
int totalImages = 1;
for (PDPage page : list) {
PDResources pdResources = page.getResources();
Map pageImages = pdResources.getXObjects();
if (pageImages != null){
Iterator imageIter = pageImages.keySet().iterator();
while (imageIter.hasNext()){
String key = (String) imageIter.next();
Object obj = pageImages.get(key);
if(obj instanceof PDXObjectImage) {
PDXObjectImage pdxObjectImage = (PDXObjectImage) obj;
pdxObjectImage.write2file(destinationDir + fileName+ "_" + totalImages);
totalImages++;
}
}
}
}
} else {
System.err.println("File not exist");
}
}
catch (Exception e){
System.err.println(e.getMessage());
}
}
//// PARTIAL SOLUTION/////
I have solved the problem of the error message. I have updated the correct code in the post as well. However, the problem remains the same. I am still not able to extract the images from few of the files. Like the one, I have mentioned in this post. Any solution in that regards.