35

I try to split a multipage PDF with Ghostscript, and I found the same solution on more sites and even on ghostscript.com, namely:

gs -sDEVICE=pdfwrite -dSAFER -o outname.%d.pdf input.pdf

But it seems not working for me, because it produces one file, with all pages, and with the name outname.1.pdf.

When I add the start and end pages, then it is working fine, but I want it to work without knowing those parameters.

In the gs-devel archive, I found a solution for this: http://ghostscript.com/pipermail/gs-devel/2009-April/008310.html -- but I feel like doing it without pdf_info.

When I use a different device, for example pswrite, but same parameters, it works correctly, producing as many ps files, as my input.pdf contains.

Is this normal when using pdfwrite? Am I doing something wrong?

Kurt Pfeifle
  • 78,224
  • 20
  • 220
  • 319
zseder
  • 939
  • 2
  • 10
  • 15

6 Answers6

24

I found this script wriiten by Mr Weimer super useful:

#!/bin/sh
#
# pdfsplit [input.pdf] [first_page] [last_page] [output.pdf] 
#
# Example: pdfsplit big_file.pdf 10 20 pages_ten_to_twenty.pdf
#
# written by: Westley Weimer, Wed Mar 19 17:58:09 EDT 2008
#
# The trick: ghostscript (gs) will do PDF splitting for you, it's just not
# obvious and the required defines are not listed in the manual page. 

if [ $# -lt 4 ] 
then
        echo "Usage: pdfsplit input.pdf first_page last_page output.pdf"
        exit 1
fi
gs -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$4" -dFirstPage=$2 -dLastPage=$3 -sDEVICE=pdfwrite "$1"

Origin from : http://www.cs.virginia.edu/~weimer/pdfsplit/pdfsplit

save it as pdfsplit.sh, see the magic happens.

PDFSAM also could do the job. Available on Windows and Mac.

OlivierBlanvillain
  • 7,475
  • 4
  • 29
  • 49
Juanito Fatas
  • 7,745
  • 5
  • 38
  • 58
14

What you see is "normal" behaviour: the current version of Ghostscript's pdfwrite output device does not support this feature. This is also (admittedly, somehow vaguely) documented in Use.htm:

"Note, however that the one page per file feature may not be supported by all devices...."

I seem to remember that one of the Ghostscript developers mentioned on IRC that they may add this feature to pdfwrite in some future release, but it seems to necessitate some major code rewrite, which is why they haven't done it yet...


Update: As Gordon's comment already hinted at, as of version 9.06 (released on July 31st, 2012), Ghostscript now supports the commandline as quoted in the question also for pdfwrite. (Gordon must have discovered the unofficial support for this already in 9.05, or he compiled his own executable from the pre-release sources which were not yet tagged as 9.06).

Kurt Pfeifle
  • 78,224
  • 20
  • 220
  • 319
  • Yeah, I read this line, but my phrase "normal behaviour" wants to mean that "is pdfwrite one of those who may not support this feature?" Your remembering of this IRC is okay for me, Thank you. – zseder Apr 19 '12 at 15:42
  • 4
    For people finding this answer in searches: As of 9.05, one-page-per-file works for me with the OP's command. – Gordon Jul 04 '12 at 19:05
  • 1
    @Gordon: Support for the `-o out_%d.pdf` syntax (to split multipage PDF into individual files per page) became official in 9.06. I hinted at this already in other answers (f.e. *[Split multi page PDF file into single pages](http://stackoverflow.com/a/12744923/359307)*). I forgot to update this answer. Thanks for the hint. – Kurt Pfeifle Nov 26 '12 at 14:21
5
 #!/bin/bash
#where $1 is the input filename

ournum=`gs -q -dNODISPLAY -c "("$1") (r) file runpdfbegin pdfpagecount = quit" 2>/dev/null`
echo "Processing $ournum pages"
counter=1
while [ $counter -le $ournum ] ; do
    newname=`echo $1 | sed -e s/\.pdf//g`
    reallynewname=$newname-$counter.pdf
    counterplus=$((counter+1))
    # make the individual pdf page
    yes | gs -dBATCH -sOutputFile="$reallynewname" -dFirstPage=$counter -dLastPage=$counter -sDEVICE=pdfwrite "$1" >& /dev/null
    counter=$counterplus
done
Community
  • 1
  • 1
5

Here is a script for Windows command prompt (working also with drag and drop) assuming you have Ghostscript installed:

@echo off
chcp 65001
setlocal enabledelayedexpansion

rem Customize or remove this line if you already have Ghostscript folders in your system PATH
set path=C:\Program Files\gs\gs9.22\lib;C:\Program Files\gs\gs9.22\bin;%path%

:start

echo Splitting "%~n1%~x1" into standalone single pages...
cd %~d1%~p1
rem getting number of pages of PDF with GhostScript
for /f "usebackq delims=" %%a in (`gswin64c -q -dNODISPLAY -c "(%~n1%~x1) (r) file runpdfbegin pdfpagecount = quit"`) do set "numpages=%%a"

for /L %%n in (1,1,%numpages%) do (
echo Extracting page %%n of %numpages%...
set "x=00%%n"
set "x=!x:~-3!"
gswin64c.exe -dNumRenderingThreads=2 -dBATCH -dNOPAUSE -dQUIET -dFirstPage=%%n -dLastPage=%%n -sDEVICE=pdfwrite -sOutputFile="%~d1%~p1%~n1-!x!.pdf" "%1"
)

shift
if NOT x%1==x goto start

pause

Name this script something like split PDF.bat and put it on your desktop. Drag and drop one (or even more) multipage PDF on it and it will create one standalone PDF file for each page of your PDF, appending the suffix -001, -002 and so on to the name to distinguish the pages.

You might need to customize (with relevant Ghostscript version) or remove the set path=... line if you already have Ghostscript folders in your system PATH environment variable.

It works for me under Windows 10 with Ghostscript 9.22. See comments to make it work with Ghostscript 9.50+.

Enjoy.

mmj
  • 4,509
  • 2
  • 35
  • 42
  • +1 for getting the page count with GS, good job! If anyone wants to get the page count on linux/macOS, use `gs -q -dNODISPLAY -c "(../escaped\ file \name.pdf) (r) file runpdfbegin pdfpagecount = quit"` – Gus Neves Oct 04 '18 at 15:41
  • 1
    Very helpful. Does work with GS 9.22 but is somehow incompatible to (at least) 9.50 and 9.52. Somebody knows how to fix this? – tstone-1 Mar 20 '20 at 02:48
  • @user18258 I don't know how to fix this but anyway I found more convenient to use another command line tool to split PDF files on Windows, `sedja` console. Here is a drag-and-drop batch: https://www.codepile.net/pile/6lWv3wzY – mmj Mar 20 '20 at 09:12
  • @mmj Thanks for the code based on `sedja`! I'm using GhostScript for a lot of 'shell:sendto' tasks and would still be interested in a 9.52 compatible solution - although I understand that you won't provide it. I found a small bug in your GS-based code above (which I'm still using with GS version 9.27!): I think that `gswin64c.exe ... "%1"` should be `gswin64c.exe ... %1`, or else there will be trouble when the path contains spaces. – tstone-1 Aug 19 '20 at 14:15
  • @tstone-1 It seems that for Ghostscript 9.50+ you have to add the `-dNOSAFER` option (together with `-dNODISPLAY`). See: https://stackoverflow.com/q/40156190 – mmj Oct 07 '20 at 12:21
2

Here's a simple python script which does it:

#!/usr/bin/python3

import os

number_of_pages = 68
input_pdf = "abstracts_rev09.pdf"

for i in range(1, number_of_pages +1):
    os.system("gs -q -dBATCH -dNOPAUSE -sOutputFile=page{page:04d}.pdf"
              " -dFirstPage={page} -dLastPage={page}"
              " -sDEVICE=pdfwrite {input_pdf}"
              .format(page=i, input_pdf=input_pdf))
Chris Martin
  • 28,558
  • 6
  • 66
  • 126
Adobe
  • 11,070
  • 6
  • 77
  • 117
0

Updated answer which relies on pdftk.exe only, without invoking Ghostscript

The answer provided by user @mmj used to work fine for me, but somehow ceased working somewhere between GS versions 9.20 and 9.50. I'm also aware of the solution provided by @Adobe. However, I like to get recurring tasks done from Windows (10) Explorer by selecting one or more files and going for right click → Send To. Here's a Python script (compatible to 3.8) that uses pdftk.exe (tested with 2.02) to count the total number of pages and extract all to single files. It should accept multiple PDFs as input. Make sure you have Python and pdftk.exe within PATH.

Name this extract-pdf-pages-py.cmd and put it to shell:sendto:

python %APPDATA%\Microsoft\Windows\SendTo\extract-pdf-pages-py.py %*

Put the following to extract-pdf-pages-py.py in the same folder:

#!/usr/bin/python3
# put as extract-pdf-pages-py.py to shell:sendto

import os
import subprocess
import re
import sys
import mimetypes


def is_tool(name):
    from shutil import which
    return which(name) is not None


if not is_tool('pdftk'):
    input('pdftk.exe not within PATH. Aborting...')
    raise SystemExit("pdftk.exe not within PATH.")

sys.argv.pop(0)

for j in range(len(sys.argv)):
    input_pdf = sys.argv[j]

    if 'application/pdf' not in mimetypes.guess_type(input_pdf):
        input(f"File {input_pdf} is not a PDF. Skipping...")
        continue

    savefile = input_pdf.rstrip('.pdf')

    numpages = subprocess.Popen(f"pdftk \"{input_pdf}\" dump_data", shell=True, stdout=subprocess.PIPE)
    output1 = str(numpages.communicate()[0])
    output2 = re.search("NumberOfPages: ([0-9]*)", output1)
    number_of_pages = int(output2.group(1))

    for i in range(1, number_of_pages + 1):
        os.system(f"pdftk \"{input_pdf}\" cat {i} output \"{savefile}\"{i:04d}.pdf")

I've used code from this answer (script by @Adobe) and that one (is_tool).

tstone-1
  • 123
  • 4