3

I have to check if a pdf file is in PDF/A 1-a format or not using pdfbox or any other free library in java . I have searched a lot on google in this regard but still i couldnt get any code or technique for doing this.

How can I check this in java .

sameer singh
  • 139
  • 2
  • 14
  • What about [PDFBOX PDF/A validation](https://pdfbox.apache.org/cookbook/pdfavalidation.html))? – Joop Eggen Dec 31 '14 at 12:15
  • There they have only mentioned the validation for PDF/A-1b format . No proper documantation or explanation is given. – sameer singh Dec 31 '14 at 12:23
  • As far as I know, **itext** is the unbeatable champion (Lowagie thanks) - you could search there. For the rest PDFLib 7 is used for validation, but you probably know already. – Joop Eggen Dec 31 '14 at 12:55
  • 2
    @Sameer Do you really have to check whether it **is** PDF/A-1a or does it suffice to check whether it **claims to be** PDF/A-1a? The former one is not trivial. – mkl Dec 31 '14 at 13:02
  • @sameersingh What "documentation" do you need? PDFBox preflight tells you that it is PDF/A-1b, or why it is not. For further information, you will need to buy the PDF/A-1b specification. – Tilman Hausherr Jan 01 '15 at 08:14
  • @tilman I actually wanted to check if its PDF/A-1a or not ... so the below code is working fine I have checked it with few sample PDFs. – sameer singh Jan 01 '15 at 11:03
  • PDFBox preflight does not check for PDF/A-1a, only PDF/A-1b. The small "b" means "basic", the small "a" means "accessible", i.e. good for people with vision problems who need a screenreader. – Tilman Hausherr Jan 01 '15 at 11:07

1 Answers1

5

The document from pdfbox shows how to do PDF/A-1b validation:

https://pdfbox.apache.org/cookbook/pdfavalidation.html

to do pdf/a-1a validation, you simply change :

  parser.parse();

to:

 parser.parse(Format.PDF_A1A);

I was able to ascertain this from reading the parser source code located here:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.pdfbox/preflight/1.8.2/org/apache/pdfbox/preflight/parser/PreflightParser.java

Mark Lybarger
  • 423
  • 3
  • 9
  • Source code is here: https://svn.apache.org/viewvc/pdfbox/trunk/preflight/?sortby=date – Tilman Hausherr Jan 01 '15 at 08:16
  • can we check if the pdf or the images in the pdf are color or not and also find their pixel density. – sameer singh Jan 06 '15 at 07:16
  • that's another question altogether. but... you can extract the images and determine their pixel density (h x w): http://stackoverflow.com/questions/8705163/extract-images-from-pdf-using-pdfbox , but i don't know how to tell if an image is "color". – Mark Lybarger Jan 09 '15 at 15:10
  • It seems that In 2.0.4 this has been removed, no documentation and no way out apparently. Any idea? I'll share my solution (if I found one) – linuxatico Sep 19 '17 at 16:54