Tuesday, April 21, 2009

PDFTK - The PDF Toolkit

I recently found that I wanted to split a very large PDF document into two smaller documents, and copy the table of contents, or at least the parts relevant to the second half, into the second document, too. That's so I wouldn't have to go looking back and forth between the two documents. You can imagine similar scenarios for an index, too - you may want to copy this to the first document.

So, how does one do that? Well, I started searching around for open source tools, and at first my keywords didn't seem to be turning anything up fruitful. Add in "linux" to the search, and voila, I quickly came upon pdftk.

Splitting a file into two is a two step process. You first write the first part, by giving it a page range. Let's say your doc is 500 pages and you want to split it into two, 250 page, documents.

pdftk orig.pdf cat 1-250 output part1.pdf

Then you do the second part this way:

pdftk orig.pdf cat 251-500 output part2.pdf

Now you have two documents. In my case, I wanted to add the contents to the second part, as well.

I couldn't find a way to do that in one step - say by giving two page ranges - but I did just accomplish it by writing a temp file. Say the relevant parts I wanted to add to the part2.pdf were pages 10-20 of the orig doc. I would save those off this way:

pdftk orig.pdf cat 10-20 output contents.pdf

Then, I merged the contents.pdf and part2.pdf this way:

pdftk contents.pdf part2.pdf cat output final-part2.pdf

And I was done. Not bad, not bad at all.

