1237

How could I merge / convert multiple PDF files into one large PDF file?

I tried the following, but the content of the target file was not as expected:

convert file1.pdf file2.pdf merged.pdf

I need a very simple/basic command line (CLI) solution. Best would be if I could pipe the output of the merge / convert straight into pdf2ps ( as originally attempted in my previously asked question here: Linux piping ( convert -> pdf2ps -> lp) ).

Community
  • 1
  • 1
alcohol
  • 19,052
  • 4
  • 21
  • 21
  • 3
    ymmv, but this doesn't seem to have as good of a resolution in the output file as pdfunite and it also results in a file size larger than the output from pdfunite – sabujp Nov 17 '15 at 23:02
  • 1
    related: [linux command merge pdf files with numerical sort](http://stackoverflow.com/q/23643274/395857) – Franck Dernoncourt Feb 11 '17 at 22:06
  • Whenever links are preserved or not by those solutions is discussed [in this post](https://tex.stackexchange.com/a/531215/34551). If you want to preserve the links (probably along with other annotations), use pdftk if want a command-line interface, pdfsam if you want graphical user interface, sejda if you want a web interface. – Clément Mar 05 '20 at 03:02

23 Answers23

1622

Considering that pdfunite is part of poppler it has a higher chance to be installed, usage is also simpler than pdftk:

pdfunite in-1.pdf in-2.pdf in-n.pdf out.pdf
Hans Ginzel
  • 5,777
  • 2
  • 20
  • 21
Hubert Kario
  • 17,478
  • 3
  • 20
  • 41
  • Poppler is also very fast, from my tests – taxilian Jul 20 '13 at 18:59
  • 21
    It is fast, but it seems to break hyperlinks. See http://blog.dbrgn.ch/2013/8/14/merge-multiple-pdfs/ – Danilo Bargen Aug 14 '13 at 09:46
  • 487
    Just make sure you remember to provide out.pdf, or else it will overwrite the last file in your command, sigh. – mlissner Oct 19 '13 at 22:20
  • 10
    package for pdfunite is poppler-utils in debian but may not be present in old debian releases. – Jocelyn delalande Nov 10 '13 at 12:16
  • 19
    Cannot recommend this. The size of the the resulting PDF is far too big. For example: Pdfunite gives me a 75MB file while Ghostscript packs everything into 1MB. – Torben Dec 06 '13 at 11:58
  • 7
    Hmmm... @Torben I just packed 300+ pdfs (total of 13MB) into a single PDF using this utility and I got a 12MB file at the end. Maybe it was the version you used? I am on OpenSUSE 12.2, using pdfunite version 0.20.0. – Aaron R. Jan 13 '14 at 17:19
  • 3
    @Aaaron My comment was a bit misleading. What I meant is that pdfunite doesn't optimize the filesize. For example: 10 similar pdfs (slides for a presentation) of 1MB result in a ~10MB pdf when I use pdfunite. With ghostscript the resulting pdf is <1MB. – Torben Jan 16 '14 at 13:03
  • 1
    In my case pdfunite did not produce a usable PDF file. When I loaded it with `evince` I got lots of errors. The `gs` solution worked. – FroMage May 12 '14 at 10:03
  • 72
    You can use: `pdfunite *.pdf out.pdf` assuming no other pdf exists in that directory and their order is preserved by "*". If its not preserved, using ranges: filename_{0..9}.pdf solves it. – lepe Jan 05 '15 at 05:48
  • pdfunite worked much worse for my use case than pdftk. I was trying to combine together copies of a particular form into the same pdf. In pdfunite they were linked, in pdftk they were separated and could be filled out individually. – ryantm Jun 09 '15 at 00:46
  • 1
    @Torben The file is smaller with gs because with default settings image has reduced quality (screen-view-only quality, 72 dpi images). With `-dPDFSETTINGS=/printer` file size is almost identical (high quality, 300 dpi images). – MariuszS Jul 22 '15 at 12:59
  • 1
    @MariuszS Even though you might be right for some special cases, you're wrong with you're general assumption. I just merged 133MB of PDFs (84 files exported from Inkscape), which contain bitmaps, vectorgraphics and text, into one PDF with a size of 1.6MB. I used `/prepress` for that and `/printer` even reduced the size to 1.3MB. Even though I zoomed in and printed a part, I couldn't find any visible difference between the single PDF and the merged version. I'm pretty sure Ghostscript compares the merged PDFs and stores shared content just for one time. – Torben Jul 22 '15 at 21:08
  • 1
    @DaniloBargen `pdfunite` doesn't break external hyperlinks. I have merged documents with links. Links were kept functional. But `pdfunite` might break internal hyperlinks as indicated in the [blog you mentioned](https://blog.dbrgn.ch/2013/8/14/merge-multiple-pdfs/). – Paul Rougieux Apr 20 '16 at 13:07
  • 1
    pdfunite worked well for me. preserved resolution of original pdfs. pdf size simple addition of original pdfs. convert -compress lossless did the same as convert, resolution lost and file size much increased. pdfunite version 0.22.1 vs ImageMagick 6.7.8-9. 69k + 130k pdf => 198k with pdfunite, => 771k with convert. text+gfx(originally odt) + gfx(pdf print to pdf) pdf. – gaoithe Oct 20 '16 at 10:08
  • 3
    Doesn't work with a pdf I have gives `Unimplemented Feature: Could not merge encrypted files ('MR1418_introduction.pdf')`. But `pdftk` was able to handle it, albeit admonishing me for not having some passwords you don't need. – salotz May 18 '17 at 21:29
  • Incredibly much faster than `convert`, and the resolution does not get worse. TOP – Campa Aug 29 '17 at 08:05
  • Still works to perfection. I didn't see any size change either; 9.2 mb of files made a single 9.2mb file. Didn't have any hyperlinks, so can't comment on that aspect. – Nostradamnit Jan 03 '18 at 17:14
  • Just put the following in a bash script: `pdfunite $@ out.pdf` – bjd2385 May 06 '18 at 08:52
  • @AaronR Happy to report working well on openSuse Leap 42.3. Its speed is quite blazing, combining 100-odd pages in less than a second. – Tom Russell Oct 15 '18 at 07:05
  • Breaks hyperlinks. – Winny Dec 04 '18 at 04:37
  • It seems that page numbers from individual documents are also not preserved. (pdfunite version 0.73.0) – minexew Dec 22 '19 at 22:11
  • 2
    `sudo apt-get install poppler-utils` – ostrokach Dec 30 '19 at 02:28
  • 1
    WATCH OUT! The last input parameter is where the program writes the output, so if you accidentally forget `out.pdf` and do `pdfunite in1.pdf in2.pdf` instead of `pdfunite in1.pdf in2.pdf out.pdf`, you'll have just accidentally written `in1.pdf` right on top of `in2.pdf`, ruining it! Don't do that. – Gabriel Staples Aug 13 '20 at 18:52
  • 1
    Different pages size, want to "shrink to fit"? Try another tool included in poppler: `pdftocairo -pdf -paper A4 in.pdf out.pdf` ... and make backups as people point out you can override inputs – KCD Dec 03 '20 at 06:22
  • I did not find that pdfunite created unreasonably large files. My two input files were both about 500k and the resulting output file was about 1 MB as one would expect. – Ethan Brown Feb 24 '21 at 20:34
  • I was fast to use it and not read the comments, so it overwrote the last file. Silly me run "pdfunite *.pdf" and then wondered where's the output file until I realized... Luckily I had a backup; It's always a good idea to have a backup. – xZero Mar 24 '21 at 15:46
638

Try the good ghostscript:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf mine1.pdf mine2.pdf

or even this way for an improved version for low resolution PDFs (thanks to Adriano for pointing this out):

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged.pdf mine1.pdf mine2.pdf

In both cases the ouput resolution is much higher and better than this way using convert:

convert -density 300x300 -quality 100 mine1.pdf mine2.pdf merged.pdf

In this way you wouldn't need to install anything else, just work with what you already have installed in your system (at least both come by default in my box).

Hope this helps,

UPDATE: first of all thanks for all your nice comments!! just a tip that may work for you guys, after googleing, I found a superb trick to shrink the size of PDFs, I reduced with it one PDF of 300 MB to just 15 MB with an acceptable resolution! and all of this with the good ghostscript, here it is:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/default -dNOPAUSE -dQUIET -dBATCH -dDetectDuplicateImages -dCompressFonts=true -r150 -sOutputFile=output.pdf input.pdf

cheers!!

Hans Ginzel
  • 5,777
  • 2
  • 20
  • 21
Gery
  • 7,033
  • 3
  • 19
  • 36
  • 30
    Nice tip, `gs` runs very fast and it compresses a lot. However, the quality improved a lot after I used this param: `-dPDFSETTINGS=/prepress` – Adriano P Dec 15 '13 at 23:39
  • 3
    I found that `-dPDFSETTINGS=/prepress` has the very nice effect of rotating pages that are too wide and force annoying horizontal scroll bars. – Robert Smith Aug 21 '14 at 03:40
  • 30
    Add the following line to your `.bash_profile` and you have a nice shortcut: `pdfmerge() { gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=$@ ; }` This saves you some typing, if you have to use the command a lot. The usage looks like this: `pdfmerge merged.pdf mine1.pdf mine2.pdf` – Torben Jul 22 '15 at 21:36
  • 2
    I tried to find description for -dBATCH flag but couldn't. Even man gs doesn't say anything. But great and without any additional programs! – Michal Gonda Sep 10 '15 at 08:21
  • Since Imagemagick is based upon Ghostscript, if you already have it, you can use it as well; `convert file1.pdf file2.pdf outputfile.pdf`. – Sablefoste Nov 23 '15 at 14:59
  • 1
    @RobertSmith Interestingly, when I tried the first command it rotated some pages (that seemed to be the same proportions as the others) while the `/prepress` version did no rotation. – JAB Feb 06 '17 at 16:55
  • 1
    I like this solution because it keeps section headers that I can use to jump around with my pdf software – awelkie Aug 13 '17 at 14:36
  • 1
    Beautiful answer. I was having a ton of frustration ever since I heard Mac's Preview application could help, but I barely got it to work the first time, and none the second time I needed it. The biggest trouble was it not being determinant with its actions. I used to be able to drag entire PDF files over, but tried invidiual thumbnails, and also had saving issues between a new copy and at all! So this solution was a nice break. The only additional task I had was throwing in a GIF image at the end using Preview since Ghostscript could not handle that. Also installed from Homebrew! – Pysis Jul 31 '18 at 18:17
  • 1
    This preserves hyperlinks. Nice! – Winny Dec 04 '18 at 04:37
  • 1
    I'm just gonna run this command so that it will be saved to my fish history! I'll definitely need it in the future. – adonese Dec 13 '18 at 07:17
  • @Winny, for me it didn't preserve hyperlinks. any idea why? –  Feb 20 '19 at 18:10
  • 1
    @EnanAjmain I used `gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=output.pdf a.pdf b.pdf c.pdf` This worked for me – Winny Feb 20 '19 at 18:14
  • I used it too. But in my case it didn't work. I then used an online service. Thanks for replying. –  Feb 21 '19 at 03:26
  • 9
    The `gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged.pdf mine1.pdf mine2.pdf` can be shortened to the `gs -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -o merged.pdf mine1.pdf mine2.pdf`. From [Documentation](https://www.ghostscript.com/doc/current/Use.htm#File_output): "As a convenient shorthand you can use the `-o` option followed by the output file specification as discussed above. The `-o` option also sets the `-dBATCH` and `-dNOPAUSE` options. This is intended to be a quick way to invoke `ghostscript` to convert one or more input files." – MiniMax Apr 24 '19 at 21:35
  • 1
    This also works well in windows with WSL. Install ghostscript from apt first. – JonShipman Oct 11 '19 at 14:22
  • 1
    I was used to `pdfunite`, but it resulted today in a 850 MB PDF (original 24 MB). `pdftk` did it with 44 MB, `gs` with only 32 MB (both with top resolution as far as I can tell). – doak Oct 28 '19 at 15:16
  • 1
    Strange enough. After merging with `gs`, some of pages are rotated. – Yai0Phah Jan 01 '20 at 11:10
  • 1
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=merged.pdf mine1.pdf mine2.pdf This worked perfectly on cygwin @Win 10 64 bit machine. Thanks – alphaGeek Jun 25 '20 at 16:29
  • 3
    @Winny I needed to add `dPrinted=false` to preserve hyperlinks. Otherwise it broke the links for all but the first pdf. See https://tex.stackexchange.com/questions/245801/local-hyperlinks-broken-after-pdf-processing-with-ghostscript – qdread Nov 20 '20 at 02:11
  • Ghostscript does many lossy transformations (e.g. page rotation, image quality degradation and dropping some metadata) during PDF-to-PDF conversion. If you want to have a peace of mind of having all PDF features intact, then don't use Ghostscript. – pts Feb 03 '21 at 12:16
  • @pts sure, or you can just use custom flags, so that ghostscript doesn't use lossy compression. https://superuser.com/questions/360216/use-ghostscript-but-tell-it-to-not-reprocess-images – Rainb Feb 13 '21 at 10:33
  • @Rainb: The PDF output of Ghostscript is lossy even if it is configured to use lossless image compression, because Ghostscript ignores some PDF features (e.g. interactive). – pts Feb 14 '21 at 11:15
572

I'm sorry, I managed to find the answer myself using google and a bit of luck : )

For those interested;

I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:

pdftk file1.pdf file2.pdf cat output output.pdf

OR

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...

This in turn can be piped directly into pdf2ps.

WonderLand
  • 4,853
  • 6
  • 50
  • 71
alcohol
  • 19,052
  • 4
  • 21
  • 21
  • 82
    Using ghostscript also might work: `gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in1.pdf in2.pdf in3.pdf ...` – Nate Kohl Mar 24 '10 at 13:08
  • I'd have to look more into the possible options/flags (because I don't want output in a file), but probably yes. Thank you for the suggestion. – alcohol Mar 24 '10 at 13:33
  • 18
    It is worth to mention that pdftk can merge encrypted pdfs while pdfunite cant – Thomas Apr 28 '13 at 18:54
  • 3
    gives better resolution with pdftk compare to convert in default options. – Kiran K Telukunta Mar 18 '14 at 09:44
  • 13
    `pdftk file1.pdf file2.pdf cat output out.pdf` will output the merged file as `out.pdf` – jmiserez Sep 28 '15 at 19:44
  • 2
    `pdftk` is not available for EL7 systems due to missing dependency `libgcj`. – a coder Mar 22 '16 at 20:03
  • @NateKohn @IcyFlame Unfortunately it did not work for me - said `Unknown device: pdfwriter`. Like you I preferred that approach because I had it installed already. `pdftk` worked like a charm though. – Asfand Qazi Apr 27 '16 at 10:01
  • If you don't want to type each file name : `pdftk ``ls *.pdf`` cat output out.pdf` (with just one backticks instead of two, but I didn't manage to render it with the stackoverflow markdown parser) – Charles-Édouard Coste Jan 28 '17 at 13:11
  • @Charles-EdouardCoste You can use two backticks as surrounding backticks if you want to use backticks inside. Or three, if you want to use double backticks. Not sure if it goes even higher. But backticks is not recommended anymore in Bash, so better would be to use `$(ls *.pdf)` – Zelphir Kaltstahl Feb 23 '17 at 09:46
  • @alcohol What does the `cat` in this command (not in general) do? I was able to merge two pdfs without that and don't see any issues with the resulting pdf. – Zelphir Kaltstahl Feb 23 '17 at 09:47
  • @Zelphir cat is the operation here (that pdftk performs), not the shell command. – alcohol Mar 03 '17 at 13:52
  • 2
    `pdftk` is neat because you can easily [select page ranges](https://www.pdflabs.com/docs/pdftk-cli-examples/) to merge: `pdftk A=file1.pdf B=file2.pdf cat A1-3 B1 output out.pdf` – z0r Jul 14 '17 at 06:28
  • For my usage, `pdftk` somehow just stuck and not producing anything, while `gs` gives perfect result. `convert`'s resolution download is true (with default setting). – Mingwei Zhang Nov 30 '17 at 23:47
  • Since `GCJ` is deprecated [link](https://gcc.gnu.org/wiki/GCJ)), `pdftk` is as well (most distros phased it out already). A nice alternative is `pdfunite` and the other poppler-utils. – Diego 72 Jan 16 '18 at 15:50
  • `gs` for joining documents apparently created under Windows gave me this error `Missing glyph CID=48, glyph=0030 in the font EAAAAB+Tahoma,Bold . The output PDF may fail with some viewers.` and indeed, on the default Ubuntu viewer I couldn't browse it – Antek May 31 '18 at 13:10
  • *The version from the official website doesn't work*. [This answer does](https://stackoverflow.com/questions/39750883/pdftk-hanging-on-macos-sierra) as of January 2019 (still official). – Jimbo Jan 08 '19 at 13:07
  • A note about **PDF Forms**. Thusfar, `pdftk` is the only tool I have tried that preserves my PDF forms exactly as they were (functional in any PDF reader *plus* Acrobat Reader). – mvreijn Feb 06 '19 at 13:09
  • `pdftk` is deprecated and no longer available in Fedora repos for example. If you can't use `pdfunite` cause encryption, then try pdf-stapler: https://github.com/hellerbarde/stapler/ – Freedom_Ben Sep 21 '19 at 21:30
  • `pdftk` also allows to duplicate pages within a single document : `pdftk input.pdf cat 1-10 10 10 10 10-20 output output.pdf` would produce `input.pdf` with 5 times the page 10. – Skippy le Grand Gourou Nov 20 '19 at 18:34
  • `pdftk` is a hell to install – Rainb Nov 30 '20 at 11:12
117

This is the easiest solution if you have multiple files and do not want to type in the names one by one:

qpdf --empty --pages *.pdf -- out.pdf
William Miller
  • 8,060
  • 3
  • 13
  • 39
SaTa
  • 1,680
  • 1
  • 7
  • 19
  • 5
    qpdf seems to break hyperlinks in the document – David Granqvist Oct 29 '19 at 12:51
  • 6
    Although difficult to get your head around the complex options to start with, qpdf is a very handy and powerful tool. Online documentation is available [here](http://qpdf.sourceforge.net/files/qpdf-manual.html) – Jonathan Holvey Dec 18 '19 at 11:33
  • Came here looking for a `qpdf` solution but didn't want to wade through the documentation yet again to figure it out, thank you. – Hashim Aziz Jul 15 '20 at 23:27
  • Using a shell wildcard is great as long as the order works for you! Check the order first with `echo *.pdf | tr ' ' $'\n'` or so! – lmat - Reinstate Monica Oct 24 '20 at 10:06
  • Nice script. You can arrange the order by prefixing each page with "A_", "B_" etc or **simply add `'z'`** if one document is at page 1 and you want it to be at the last page, assuming your files are named with letters and digits, and not starting with 'z'. – Antonin GAVREL Mar 10 '21 at 17:51
55

Also pdfjoin a.pdf b.pdf will create a new b-joined.pdf with the contents of a.pdf and b.pdf

rodrigob
  • 2,699
  • 2
  • 27
  • 33
  • 7
    This is nice and succinct, but breaks hyperlinks. – bright-star Oct 20 '14 at 01:36
  • 3
    pdfjoin (pdflatex) fails with files with lots of pages. Failed to merge to 1k pages files. – mdrozdziel Dec 09 '14 at 12:05
  • 1
    pdfjoin breaks annotations or additional non graphics items – sabujp Mar 08 '16 at 21:19
  • The "URW Palladio L" font became invisible after pdfjoin'ing the pages. – v_2e Nov 05 '16 at 07:37
  • 9
    pdfunite usually works well, but if it says "Unimplemented Feature: Could not merge encrypted files ", pdfjoin is a nice alternative. For whatever reason, pdfjoin doesn't complain of encryption. – Calaf Feb 24 '17 at 05:59
  • Works for me natively (probably) without any installation on MacOs Sierra (OsX). – Bhoom Suktitipat Mar 13 '17 at 06:08
  • `v2.08` of `pdfjoin` doesn't proper work for me. One of the input PDF-files contains filled forms. But no inputs appear in the result PDF-file. – palik Jan 21 '18 at 12:38
  • 1
    `pdfjam` package doesn't include `pdfjoin` script anymore. You can find the script [here](https://github.com/DavidFirth/pdfjam-extras) – Henrik Pingel Jul 15 '20 at 13:46
41

pdfunite is fine to merge entire PDFs. If you want, for example, pages 2-7 from file1.pdf and pages 1,3,4 from file2.pdf, you have to use pdfseparate to split the files into separate PDFs for each page to give to pdfunite.

At that point you probably want a program with more options. qpdf is the best utility I've found for manipulating PDFs. pdftk is bigger and slower and Red Hat/Fedora don't package it because of its dependency on gcj. Other PDF utilities have Mono or Python dependencies. I found qpdf produced a much smaller output file than using pdfseparate and pdfunite to assemble pages into a 30-page output PDF, 970kB vs. 1,6450 kB. Because it offers many more options, qpdf's command line is not as simple; the original request to merge file1 and file2 can be performed with

qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf
skierpage
  • 2,214
  • 19
  • 17
  • 3
    So much this. Parabola for instance doesn’t package `pdftk` anymore either because of its dependance on `gcj`, for which support has been dropped I believe. Despite searching for pdf manipulation tools via `pacman -Ss pdf`, I missed this. Thanks for this answer! I should receive way more upvotes, so it shows up right next to suggestions for `pdfunite` or `pdftk`. – k.stm Sep 19 '18 at 20:39
  • 1
    On my fresh install of Linux Mint, this ran in the Terminal window without requiring any installs or path adjustments. Nice! – Wallace Kelly Jun 14 '19 at 14:03
  • This worked perfectly and also gave a clearer merged document that the other commands I tried out. Thanks for the post. – Siwoku Adeola Mar 29 '20 at 19:07
  • If pages in the even.pdf file are reversed (typical when you scan on a non-double-sided scanner), you will want to use this instead: `qpdf --collate --empty --pages odd.pdf even.pdf z-1 -- merged.pdf` – caram Feb 13 '21 at 22:23
40

You can use the convert command directly,

e.g.

convert sub1.pdf sub2.pdf sub3.pdf merged.pdf
Noor
  • 18,061
  • 35
  • 123
  • 236
  • 46
    This is not lossless. – Ben Ruijl Jun 03 '14 at 14:47
  • 13
    You can `convert -compress lossless sub1.pdf sub2.pdf sub3.pdf merged.pdf`, but the resulting file size's could be way too big. I'd suggest `convert -compress jpeg -quality 90 sub1.pdf sub2.pdf sub3.pdf merged.pdf` instead. – arielnmz Aug 05 '14 at 19:53
  • 23
    This involves converting everything to raster images, it seems, which is definitely not the best, especially when dealing with text-based PDFs. – jtebert Aug 28 '14 at 18:38
  • 8
    almost a copy of what the OP has described as not working – user829755 Sep 29 '15 at 12:05
  • 17
    Do not use convert for postscript or PDF files unless you go from vector to raster and never go back. It is hard to overstate what a bad idea this is. – markgalassi Nov 29 '15 at 00:01
15

Use PDF tools from python https://pypi.python.org/pypi/pdftools/1.0.6

Download the tar.gz file and uncompress it and run the command like below

python pdftools-1.1.0/pdfmerge.py -o output.pdf -d file1.pdf file2.pdf file3 

You should install pyhton3 before you run the above command

This tools support the below

  • add
  • insert
  • Remove
  • Rotate
  • Split
  • Merge
  • Zip

You can find more details in the below link and it is open source

https://github.com/MrLeeh/pdftools

Ravikiran Reddy Kotapati
  • 1,942
  • 1
  • 19
  • 24
  • 1
    This is perfect. Using `gs` (all variants listed above), a simple merge of two PDFs, 2MB and 500Kb, was taking minutes to complete and resulting in a 40MB file! `pdftools` completes instantaneously with identical file size. – supergra Nov 16 '18 at 18:47
  • Or you can install it anyway. Total size of dependencies is < 100 kb. – tejasvi88 Jan 26 '21 at 14:46
13

Apache PDFBox http://pdfbox.apache.org/

PDFMerger This application will take a list of pdf documents and merge them, saving the result in a new document.

usage: java -jar pdfbox-app-x.y.z.jar PDFMerger "Source PDF files (2 ..n)" "Target PDF file"

lumpchen
  • 195
  • 1
  • 4
9

You can use sejda-console, free and open source. Unzip it and run sejda-console merge -f file1.pdf file2.pdf -o merged.pdf

It preserves bookmarks, link annotations, acroforms etc.. it actually has quite a lot of options you can play with, just run sejda-console merge -h to see them all.

Andrea Vacondio
  • 796
  • 7
  • 17
7

If you want to convert all the downloaded images into one pdf then execute

convert img{0..19}.jpg slides.pdf

Martin Seeler
  • 6,475
  • 2
  • 29
  • 43
Trupti Kini
  • 154
  • 1
  • 2
  • 6
    Do not use convert for postscript or PDF files unless you go from vector to raster and never go back. It is hard to overstate what a bad idea this is. – markgalassi Nov 29 '15 at 00:02
6

I am biased being one of the developers of PyMuPDF (a Python binding of MuPDF).

You can easily do what you want with it (and much more). Skeleton code works like this:

#-------------------------------------------------
import fitz         # the binding PyMuPDF
fout = fitz.open()  # new PDF for joined output
flist = ["1.pdf", "2.pdf", ...]  # list of filenames to be joined

for f in flist:
    fin = fitz.open(f)  # open an input file
    fout.insertPDF(fin) # append f
    fin.close()

fout.save("joined.pdf")
#-------------------------------------------------

That's about it. Several options are available for selecting only pages ranges, maintaining a joint table of contents, reversing page sequence or changing page rotation, etc., etc.

We are on PyPi.

Jorj McKie
  • 587
  • 4
  • 13
5

I second the pdfunite recommendation. I was however getting Argument list too long errors as I was attempting to merge > 2k PDF files.

I turned to Python for this and two external packages: PyPDF2 (to handle all things PDF related) and natsort (to do a "natural" sort of the directory's file names). In case this can help someone:

from PyPDF2 import PdfFileMerger
import natsort
import os

DIR = "dir-with-pdfs/"
OUTPUT = "output.pdf"

file_list = filter(lambda f: f.endswith('.pdf'), os.listdir(DIR))
file_list = natsort.natsorted(file_list)

# 'strict' used because of
# https://github.com/mstamy2/PyPDF2/issues/244#issuecomment-206952235
merger = PdfFileMerger(strict=False)

for f_name in file_list:
  f = open(os.path.join(DIR, f_name), "rb")
  merger.append(f)

output = open(OUTPUT, "wb")
merger.write(output)
Greg Sadetsky
  • 3,995
  • 1
  • 30
  • 42
  • 6
    "Argument list too long" indicates that you're going over the shell's allocated buffer size for the environment -- it's not actually a limitation of the tool. In such a case, switching to Python may be overkill, since you can just batch: find input -name \*.pdf | xargs -P1 -n500 sh -c 'pdfunite "$@" output-`date +%s`.pdf' && pdfunite output-*.pdf output.pdf (This will create batches of 500 files processed serially, make the resulting temporary files sort in the right order, and produce an appropriate output file; you'll need to clean up the temporary files after) – enkiv2 Nov 01 '17 at 11:30
  • `pdftools` is a wrapper for PyPDF2. See [this](https://stackoverflow.com/a/44946530/8211365) answer. – tejasvi88 Jan 26 '21 at 14:48
3

Here's a method I use which works and is easy to implement. This will require both the fpdf and fpdi libraries which can be downloaded here:

require('fpdf.php');
require('fpdi.php');

$files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf'];

$pdf = new FPDI();

foreach ($files as $file) {
    $pdf->setSourceFile($file);
    $tpl = $pdf->importPage(1, '/MediaBox');
    $pdf->addPage();
    $pdf->useTemplate($tpl);
}

$pdf->Output('F','merged.pdf');
billynoah
  • 17,021
  • 9
  • 67
  • 90
3

Although it's not a command line solution, it may help macos users:

  1. Select your PDF files
  2. Right-click on your highlighted files
  3. Select Quick actions > Create PDF
DevonDahon
  • 4,134
  • 2
  • 37
  • 57
3

You can see use the free and open source pdftools (disclaimer: I am the author of it).

It is basically a Python interface to the Latex pdfpages package.

To merge pdf files one by one, you can run:

pdftools --input-file file1.pdf --input-file file2.pdf --output output.pdf

To merge together all the pdf files in a directory, you can run:

pdftools --input-dir ./dir_with_pdfs --output output.pdf
robertspierre
  • 1,669
  • 1
  • 19
  • 23
2

I like the idea of Chasmo, but I preffer to use the advantages of things like

convert $(ls *.pdf) ../merged.pdf

Giving multiple source files to convert leads to merging them into a common pdf. This command merges all files with .pdfextension in the actual directory into merged.pdf in the parent dir.

peterh
  • 9,698
  • 15
  • 68
  • 87
user3709983
  • 117
  • 1
  • 2
  • 5
    Given how similar this looks to the original question, it seems like this should have been a comment, not an answer. With a bit more rep, [you will be able to post comments](http://stackoverflow.com/privileges/comment). Until then, please do not use answers as a workaround. – Nathan Tuggy May 16 '15 at 02:02
  • 1
    @Silfheed No, it answers the question! Although the answer maybe should have more elaborated. – peterh May 16 '15 at 08:33
  • 7
    Do not use convert for postscript or PDF files unless you go from vector to raster and never go back. It is hard to overstate what a bad idea this is. – markgalassi Nov 29 '15 at 00:02
  • 14
    What is the point of using `$(ls *.pdf)` in place of simple wildcard `*.pdf`? – firegurafiku Dec 18 '15 at 04:56
  • Additionally with reference to @firegurafiku answer, with `ls *.pdf` wildcard you lose a control over the order of merged files. In an example, the following list: 1.pdf, 2.pdf, 3.pdf, ..., 10.pdf, ..., 100.pdf will actually be merged like 1.pdf, 10.pdf, 100.pdf, 2.pdf, 3.pdf (due to default Linux way of ordering files - here you have more details about this problem - https://stackoverflow.com/q/22948042/1977012). – Egel Jun 28 '18 at 08:31
1

bash-script, which checks for merging errors

I had the problem, that a few pdf-merges produced some error messages. As it is quite a lot trial and error to find the corrupt pdfs, I wrote a script for it.

The following bash-script, merges all available pdfs in a folder one by one and gives a success status after each merge. Just copy it in the folder with the pdfs and execute from there.

    #!/bin/bash
    
    PDFOUT=_all_merged.pdf
    rm -f ${PDFOUT}
    
    for f in $(ls *.pdf)
    do
      printf "processing %-50s" "$f  ..."
      if [ -f "$PDFOUT" ]; then
        # https://stackoverflow.com/questions/8158584/ghostscript-to-merge-pdfs-compresses-the-result
        #  -dPDFSETTINGS=/prepress
        status=`gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile="${PDFOUT}.new" ${PDFOUT} "$f" 2> /dev/null`
        nChars=`echo -n "${status}" | wc -c`
        if [ $nChars -gt 0 ]
        then
          echo "gs ERROR"
        else
          echo "successfully"
        fi
        mv "${PDFOUT}.new" ${PDFOUT}
      else
        cp "$f" ${PDFOUT}
        echo "successfully"
      fi
    done

example output:

processing inp1.pdf  ...                                     successfully
processing inp2.pdf  ...                                     successfully
Markus Dutschke
  • 5,003
  • 2
  • 34
  • 38
1

PdfCpu works great:

pdfcpu merge c.pdf a.pdf b.pdf

https://pdfcpu.io/core/merge

Steven Penny
  • 82,115
  • 47
  • 308
  • 348
1
pdfconcat -o out.pdf 1.pdf 2.pdf

``pdfconcat is a small and fast command-line utility written in ANSI C that can concatenate (merge) several PDF files into a long PDF document.''

kleinbottle4
  • 151
  • 5
1

I used qpdf from terminal and work for me at Windows (Mobaxterm) and Linux, for example the command for join A.pdf with B.pdf at new file C.pdf is:

qpdf --empty --pages oficios/A.pdf informes/B.pdf -- salida/C.PDF

If need more documentation [https://net2.com/how-to-merge-or-split-pdf-files-on-linux/][1]

Doberon
  • 443
  • 4
  • 14
0

If you want to join all PDF files in a directory with Ghostscript, you can use find to do just that. Here's an example

find . -name '*.pdf' -exec gs -o -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=../out.pdf {} +

Will find all pdf in current directory, and create out.pdf in parent directory. Might be useful if they're looking for a quick way for do an entire directory with ghostscript.

Rainb
  • 1,202
  • 7
  • 25
-1

Yet another option, useful is you want to select also the pages inside the documents to be merged:

pdfjoin image.jpg '-' doc_only_first_pages.pdf '1,2' doc_with_all_pages.pdf '-'

It comes with package texlive-extra-utils

jgpATs2w
  • 1,577
  • 16
  • 24