I am able to fetch all the images and get the coordinates from the pdf using pdfbox. But when I parse the pdf using tika server, I get the text only. So how will I know when the image occures so that I can put the image exactly after that text. I am using the code given in the following 1st answer: extract images from pdf using pdfbox
I am using tika server 1.7 I am talking the data of the pdf in the parser and using plain text version. I just want to know while parsing, how I will know that an image is encountered.
I got the HTML output using the praseToHTML() at this link https://tika.apache.org/1.10/examples.html But still this is not giving me the images present in the pdf. Nor it is giving any tag.