I want to know how to extract data from a pdf file using java

Question

I am doing a project on extracting data from a pdf file so can anyone let me know how I can extract all the data present in a pdf file.

for example if you take a IEEE paper as pdf file i want to extract only the abstract part separately and then introduction part like that — vinod, Mar 12 '14 at 12:55

score 1 · Answer 1 · answered Mar 11 '14 at 18:49

1

You might look into using PDFBox - http://pdfbox.apache.org/

It's open source java and can be used to extract content from documents.

answered Mar 11 '14 at 18:49

tino

can you give a sample code how to implement pdfbox – vinod Mar 12 '14 at 13:12
Thanks a lot I have tried it but it taking more time to execute and I have a small doubt ... Can I know the font size using PDF box ? – vinod Mar 13 '14 at 03:06
Maybe this link will help - http://stackoverflow.com/questions/3203790/parsing-pdf-files-especially-with-tables-with-pdfbox – tino Mar 13 '14 at 15:41

1 Answers1