Linux – Chop pages of a PDFs into multiple pages

ghostscriptlinuxpdf

I've got a bunch of PDF files that contain two "real" pages to a single PDF page; I'd like to chop these in half and put each half on a separate page. Essentially, I need something that does the exact opposite of pdfnup (or psnup). How can this feat be achieved?

Platform is Linux, open source preferred; as I've got a great pile of these to do something that can be scripted (as opposed to a GUI) would be nice, so I can just give it a list of them and have it chew away.

A pre-existing script isn't the only option, either; if there's sample code to manipulate PDFs in similar ways with a third-party library, I can probably hack it into doing what I want.

Best Answer

You can solve this with the help of Ghostscript. pdftk alone cannot do that (to the best of my knowledge). I'll give you the commandline steps to do this manually. It will be easy to script this as a procedure, also with different parameters for page sizes and page numbers. But you said that you can do that yourself ;-)

How to solve this with the help of Ghostscript...

...and for the fun of it, I've recently done it not with an input file featuring "double-up" pages, but one with "treble-ups". You can read the answer for this case here.

Your case is even simpler. You seem to have something similar to this:

+------------+------------+   ^
|            |            |   |
|      1     |      2     |   |
|            |            | 595 pt
|            |            |   |
|            |            |   |
|            |            |   |
+------------+------------+   v
             ^
            fold
             v
+------------+------------+   ^
|            |            |   |
|      3     |      4     |   |
|            |            | 595 pt
|            |            |   |
|            |            |   |
|            |            |   |
+------------+------------+   v
<---------- 842 pt -------->

You want to create 1 PDF with 4 pages, each of which has the size of 421 pt x 595 pt.

First Step

Let's first extract the left sections from each of the input pages:

gs \
    -o left-sections.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [0 0]>> setpagedevice" \
    -f double-page-input.pdf

What did these parameters do?

First, know that in PDF 1 inch == 72 points. Then the rest is:

  • -o ...............: Names output file. Implicitely also uses -dBATCH -dNOPAUSE -dSAFER.
  • -sDEVICE=pdfwrite : we want PDF as output format.
  • -g................: sets output media size in pixels. pdfwrite's default resolution is 720 dpi. Hence multiply by 10 to get a match for PageOffset.
  • -c "..............: asks Ghostscript to process the given PostScript code snippet just before the main input file (which needs to follow with -f).
  • <</PageOffset ....: sets shifting of page image on the medium. (Of course, for left pages the shift by [0 0] has no real effect.)
  • -f ...............: process this input file.

Which result did the last command achieve?

This one:

Output file: left-sections.pdf, page 1
+------------+  ^
|            |  |
|     1      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v

Output file: left-sections.pdf, page 2
+------------+  ^
|            |  |
|     3      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v
<-- 421 pt -->

Second Step

Next, the right sections:

gs \
    -o right-sections.pdf \
    -sDEVICE=pdfwrite \
    -g4210x5950 \
    -c "<</PageOffset [-421 0]>> setpagedevice" \
    -f double-page-input.pdf

Note the negative offset since we are shifting the page to the left while keeping the viewing area stationary.

Result:

Output file: right-sections.pdf, page 1
+------------+  ^
|            |  |
|     2      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v

Output file: right-sections.pdf, page 2
+------------+  ^
|            |  |
|     4      |  |
|            |595 pt
|            |  |
|            |  |
|            |  |
+------------+  v
<-- 421 pt -->

Last Step

Now we combine the pages into one file. We could do that with ghostscript as well, but we'll use pdftk instead, because it's faster for this job:

pdftk \
  A=right-sections.pdf \
  B=left-sections.pdf \
  shuffle \
  output single-pages-output.pdf
  verbose

Done. Here is the desired result. 4 different pages, sized 421x595 pt.

Result:

+------------+ +------------+ +------------+ +------------+   ^
|            | |            | |            | |            |   |
|     1      | |     2      | |     3      | |     4      |   |
|            | |            | |            | |            |5595 pt
|            | |            | |            | |            |   |
|            | |            | |            | |            |   |
|            | |            | |            | |            |   |
+------------+ +------------+ +------------+ +------------+   v
<-- 421 pt --> <-- 421 pt --> <-- 421 pt --> <-- 421 pt -->
Related Question