I have thousands of PDF files that I need to convert into txt files, but I need to preserve the original line breaks, let me give an example to better explain. The files are in this format
(example A)
1. Lorem ipsum dolor sit amet, consectetur adipiscing elit
2. Lorem ipsum dolor sit amet, consectetur adipiscing elit
3. Lorem ipsum dolor sit amet, consectetur adipiscing elit
4. Lorem ipsum dolor sit amet, consectetur adipiscing elit
the lines are very long, much longer than the one I use above, so in the PDF they get wrapped around like this
(example B)
1. Lorem ipsum dolor sit amet,
consectetur adipiscing elit
2. Lorem ipsum dolor sit amet,
consectetur adipiscing elit
3. Lorem ipsum dolor sit amet,
consectetur adipiscing elit
4. Lorem ipsum dolor sit amet,
consectetur adipiscing elit
How do I get the text like example A without the wrapping line breaks? I have tried using PHP PDFParser library, Python PDFMiner, XPDF pdftotxt, none of them worked, they either gave me example B, or a downright mess.
The thing that made me think this is possible is if I use the online service http://pdf2doc.com/ it gives me example A, just the way I want it, then I can just save the the doc as txt.