I am doing a project on extracting data from a pdf file so can anyone let me know how I can extract all the data present in a pdf file.
Asked
Active
Viewed 183 times
-3
-
1Start somewhere, then come 2 us w/ issues u get – Caffeinated Mar 11 '14 at 18:41
-
What do you mean by "*all the data*"? – PM 77-1 Mar 11 '14 at 18:42
-
for example if you take a IEEE paper as pdf file i want to extract only the abstract part separately and then introduction part like that – vinod Mar 12 '14 at 12:55
1 Answers
1
You might look into using PDFBox - http://pdfbox.apache.org/
It's open source java and can be used to extract content from documents.
![](../../users/profiles/3407548.webp)
tino
- 34
- 1
-
-
Thanks a lot I have tried it but it taking more time to execute and I have a small doubt ... Can I know the font size using PDF box ? – vinod Mar 13 '14 at 03:06
-
Maybe this link will help - http://stackoverflow.com/questions/3203790/parsing-pdf-files-especially-with-tables-with-pdfbox – tino Mar 13 '14 at 15:41