I'm wondering if is there a way to obtain the content of a pdf file (raw bytes) as a String using Apache PdfBox 2.0.8. What I'm doing is to save the PDDocument object to a ByteArrayOutputStream and then create a new String getting ByteArrayOutputStream's byte array. But if I save the String to a file, the result is a blank pdf. The reason for this is because pdf's stream section bytes are different from a pdf created directly from PdDocument object to a file. After knowing this, I tried to get the ByteArrayOutputStream's character encoding using juniversalchardet, but no luck. So, is there a way to acomplish this? This is what I have tried so far:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PDDocument doc = new PDDocument();
... //Add page, font, pdPageContentStream and text only to doc object with some latin chars (áéíóú)
doc.save(baos);
So, if I create a file using baos object, the pdf file looks as expected, but if I do this:
String str = new String(baos.toByteArray());
And then create a file using str bytes, the pdf file only shows a blank page. Hope I was clear enough this time :)