1

i've been looking for quite long time for answer, but i haven't found anything. My problem is in parsing pdf, i have page made with some kind of tables. I've already written some code via which i can extract iformation from specified rectangle, but i am declaring those values in code and it is not dynamic as it should. I want to find information about cells and with this information i will be able to get those string which i will need. In PDFbox api i haven't found anything what could be useful. I would be graceful for any tips.

  • Parsing pages with structures which optically make some sense but don't have any representation of that sense in the Binary Data always is difficult and very Very Hard to realize in general. As soon as you have some information making the recognition of the structures easier, it might be more feasible. – mkl Aug 07 '13 at 20:12
  • In fact i'm considering if it is possible to get coordinates of lines which are being printed in my file. After that i could extract strings between them. Maybe anyone have any other idea how to get text from such file? – user2062882 Aug 08 '13 at 12:04
  • Possible duplicate of [Parsing PDF files (especially with tables) with PDFBox](https://stackoverflow.com/questions/3203790/parsing-pdf-files-especially-with-tables-with-pdfbox) – beldaz Oct 11 '17 at 01:14

0 Answers0