0

Recently I had asked THIS QUESTION to be able to save all the images present in a PDF file on the File System and I was able to save the images successfully.

I tested my code on a lot of pdf files and it ran just fine. But, today I came accross THIS pdf file from where it is not able to extract some images(attached below).

Can anyone please tell me what else I can do to extract these images? Is it even possible to extract them? Are they really images or something else? I would really appreciate the help.

My code(Please ignore the hardcoding as I am still testing this out):

function fn_getAllImages()
{
    var strPdf = "C:\\Users\\a614923\\Desktop\\haka\\Work\\2017\\10. October\\31\\test.PDF";
    var strout = "C:\\Users\\a614923\\Desktop\\haka\\Work\\2017\\10. October\\31\\Newfolder\\img"
    intPage = 2;          //for the 2nd page(the image is present in the 2nd page)
    var objPdf = JavaClasses.org_apache_pdfbox_pdmodel.PDDocument.load_3(strPdf);
    var objPage = objPdf.getDocumentCatalog().getAllPages().get(intPage-1);
    var objImages = objPage.getResources().getXObjects().values().toArray();
    var objImage, objImgBuffer, objImageFile;
    for(var i=0; i<objImages.length; i++)
    {
        objImage = objImages.items(i);
        Log.Message(objImage.toString());
        if(aqString.Find(objImage.toString(),"PDXObjectForm",0,false)>0)
        {
            continue;
        }
        else
        {
            objImage.write2file_2(strout+i);
            //objImgBuffer = objImage.getRGBImage();
            //objImageFile = JavaClasses.java_io.File.newInstance(strout+i+".png");
            //JavaClasses.javax_imageio.ImageIO.write(objImgBuffer,"png",objImageFile); 
        }
    }
}

The image in the PDF file which I want to save(the one inside the red box below):

enter image description here

Gurmanjot Singh
  • 8,936
  • 2
  • 17
  • 37
  • 1
    That "image" is not a bitmap image (which in your previous question you learned to export) but a collection of vector image drawing instructions. These instructions are native to PDF, so there is no separate entity (like a separate VRML file) to find and export. – mkl Nov 02 '17 at 11:26
  • By the way, how is your question related to Microsoft's JScript or SmartBear's TestComplete? – mkl Nov 02 '17 at 11:29
  • There are also 59 inline images that produce the shaded objects. – Tilman Hausherr Nov 02 '17 at 11:30
  • @mkl Actually I am using the tool testcomplete for my projects. The language in the code is jscript. – Gurmanjot Singh Nov 02 '17 at 11:30
  • @TilmanHausherr I am sorry I didn't get what you are trying to imply here. Can you please be more elaborate as all this is very new to me. – Gurmanjot Singh Nov 02 '17 at 11:36
  • 1
    with "shaded objects" I mean the three objects that are red, green and blue (cube, sphere and cone). "inline images" are images that are not in the resources directionary, they are in the content stream. Run the ExtractImages tool of 2.0 and you'll get them all. – Tilman Hausherr Nov 02 '17 at 11:39
  • @TilmanHausherr Thanks. I am looking into it. – Gurmanjot Singh Nov 02 '17 at 11:41
  • @Gurman *"Actually I am using"* - ah, ok. That fact doesn't seem to be important here, though, in particular you could ask the question without mentioning them at all... – mkl Nov 02 '17 at 11:46
  • @mkl Thanks. I have removed them. – Gurmanjot Singh Nov 02 '17 at 11:47
  • @TilmanHausherr Wow, indeed, those smaller versions of the "shaded objects" are really created in a weird way... – mkl Nov 02 '17 at 11:58
  • @TilmanHausherr I was able to get those 59 inline images using the ExtractImages class but again I didn't get the "image" I needed – Gurmanjot Singh Nov 02 '17 at 15:24
  • 1
    I never said you would. See mkl's comment. The 59 images are little bits that would have to be put together and you'd get the three objects I mentioned, but not the structure that you incorrectly assumed was an image. The only thing you could do is to render the PDF page https://stackoverflow.com/questions/23326562/apache-pdfbox-convert-pdf-to-images and then cut the part you want to reuse. – Tilman Hausherr Nov 02 '17 at 15:29
  • Yes. I just wanted to mention it for the record. – Gurmanjot Singh Nov 02 '17 at 15:30
  • Should we create an answer? Or do you want to drop the question? – mkl Nov 02 '17 at 17:33
  • @mkl Would really appreciate if you could give an answer – Gurmanjot Singh Nov 02 '17 at 17:36
  • @mkl Is there anything we can do to get that drawing? In short, Is it possible by any way? If yes, can you point me in the right direction? – Gurmanjot Singh Nov 03 '17 at 08:21
  • As **what** do you want to get it? The drawing is pure PDF vector graphics instructions. Thus the only obvious format to get it as is PDF, everything else is not extraction but instead a transformation. If you don't want to extract but to transform, into which format do you want to transform? If some bitmap format, then @Tilman already explained: "render the PDF page and then cut the part you want to reuse." – mkl Nov 03 '17 at 09:05

0 Answers0