4

Would appreciate your help with the following: I have 2 partially accessible PDFs (containing tags), and I want to concatenate them using some command line tool (as PDFtk or Ghostscript, or any Perl module): I've tried doing this with PDFtk and Ghostscript and both output a non accessible PDF without the original tags (each of the concatenated PDFs had tags).

Do you know of any way to implement this with one of the mentioned tools or some other command line tool for Linux? (Not necessarily freeware) Perl modules are also an option.

Thanks!

user2522941
  • 109
  • 7
  • 2
    iText can concatenate tagged PDFs and retain the tagging with PDFCopy (as long as the PDFs are not fillable forms) – Kevin Brown Aug 20 '13 at 02:06
  • Thank you, Kevin- this solved the problem: 1. Used this example for concatenating: [link](http://itextpdf.com/examples/iia.php?id=123) 2. Applied the following changes for keeping the tags: Added: `copy.setTagged();` Changed to: `copy.addPage(copy.getImportedPage(reader, pageN, true));` – user2522941 Aug 26 '13 at 06:56
  • Update- this doesn't fully solved the problem- I should still assert that the tags created are making sense+the read-out-loud doesn't work after the concatenation (although it does work on the original concatenated PDFs) – user2522941 Aug 26 '13 at 07:23
  • I would point out that read out loud is not a test for proper tag structure. Analyze both input files with an accessibility checker and also the output. I would bet you have untagged contentin the source documents. – Kevin Brown Aug 26 '13 at 16:48
  • Hi Kevin. Thanks for your response. I've done that: the major difference in the report for the original PDF and the concatenated version (=the originalX2...) is that the concatenated version fails on: 1. Primary language 2. Title. This is true also for the iText read-out-loud demo: http://examples.itextpdf.com/results/part4/chapter15/read_out_loud.pdf You may find it's report here: http://itext-general.2136553.n4.nabble.com/file/n4659005/read_out_loud.pdf.accreport.html The read-out-loud also doesn't work with this demo PDF. – user2522941 Aug 26 '13 at 21:42

2 Answers2

0
pdfunite in-1.pdf in-2.pdf in-n.pdf out.pdf

You can read more in a similar question

Community
  • 1
  • 1
Mohammad Izady
  • 417
  • 3
  • 11
  • I'll try the second option and update- The first one doesn't work- I've tested the first command already (before writing the post) and according to PAC2 (http://www.access-for-all.ch/en/pdf-lab/pdf-accessibility-checker-pac.html) the result isn't accessible at all and doesn't include tags although the original PDFs were counted as "partial accessible" by this software. Will updated regarding the second option soon. Thanks. – user2522941 Aug 20 '13 at 08:54
0

Solved - new version of iText works (the former, which was the newest when writing the message didn't work- only since 5.4.4 it works).

It's important to mention (was missing in documentation in the past) that when concatenating documents in tagged mode, you must keep all readers opened until the resultant document is closed, i.e.:

first: document.close(); and only after this: reader.close();

user2522941
  • 109
  • 7