Unable to parse PDF document as (key,value) pair. Can anyone, please help to parse PDF file in a structured manner?
I was able to extract text from PDF file using below JAVA code.
org.apache.pdfbox.pdmodel.PDDocument doc=null;
org.apache.pdfbox.text.PDFTextStripper pdfStripper;
java.io.File pdfFile=new java.io.File(filePathAv);
try {
doc=org.apache.pdfbox.pdmodel.PDDocument.load(pdfFile);
if (doc.isEncrypted()) {
try {
doc.load(pdfFile, "");
doc.setAllSecurityToBeRemoved(true);
}
catch(Exception e) {
throw new PRRuntimeException(e);
}
}
pdfStripper=new org.apache.pdfbox.text.PDFTextStripper();
ExtractedText=pdfStripper.getText(doc);
}
catch(Exception e){ throw new PRRuntimeException(e); }
finally {
if (doc!=null) {try { doc.close();}
catch(Exception e) {throw new PRRuntimeException(e);}}}
if there is a table in PDF file, can we extract LHS and RHS seperately?