Saturday, October 19, 2013

PDF editing and PS hacking

I have a PDF book that is a good reference. It would be great to be able to have it available on my tablet computer, but the PDF reader apps that have been tried are rubbish at navigating. Being a reference book, it is often required to skip to a specific chapter. This is slow when the PDF does not contain a sensible linked index. The only index it does have is in alphabetical order!

My initial thought was to use something like pdftk to split the book into separate files for each chapter. Having these in a folder, and using a file manager to select the required chapter as needed. This would work, but seems untidy. A better solution was required.

What I really wanted was to change the original PDF as little as possible. So I decided to add a single page to the start of the file with a linked list of the chapters. To do this required a little PostScript.

The Index

Create a file called, say, index.ps. And in the file add some PostScript commands to create the index. First we can set a title. The word "Index" is placed near the top of a 4.5 x 6.5 inch page.

/Times-BoldItalic findfont 20 scalefont setfont
100 430 moveto (Index) show

Next the text for the chapters needs to be added.

/Helvetica findfont 12 scalefont setfont

20 400 moveto (Chapter 1) show
20 380 moveto (Chapter 2) show
...

The first line sets the font size, and then the next lines position and set the text. This is repeated as much as required.

Now we need to make the text clickable. This was done using pdfmarks.

[/Page 4 /View [/XYZ null null null] /Rect [8 393 52 413] /Subtype /Link /ANN pdfmark
[/Page 5 /View [/XYZ null null null] /Rect [8 373 52 393] /Subtype /Link /ANN pdfmark
...
showpage

The coordinates for each rectangle needs to align with the relevant text above, and the page number adjusted to point to the correct page in the PDF document for that chapter. This part was a bit long winded, and I am sure could have been made better with some more fancy PostScript tricks, but at least this works.

Combine

The Index was intended to go at the front of the PDF file so that it was the first page seen when the document is opened. If this was placed there directly, I would have to adjust the page numbers above by one to account for the extra page taken by the index itself. This would work, but caused an issue with the existing index in the PDF. This would also then be out by one page. So it was decided to put the index as the last page in the PDF. This way it would still be easy to find without disturbing the rest of the document.

The code to combine my new index with the existing PDF looks like this:

gs -dBATCH -dNOPAUSE \
   -sDEVICE=pdfwrite \
   -dDEVICEWIDTHPOINTS=324 \
   -dDEVICEHEIGHTPOINTS=468 \
   -dAutoRotatePages=/None \
   -sOutputFile=temp.pdf \
   MyBook.pdf -f index.ps

This adds the Index to the back of the PDF MyBook, and puts the result into temp.pdf.

The final stage was discovered by accident. I found that if I used pdftk to manipulate the book, it would update the internal links to keep them pointing to the correct page. So I could use pdftk to move the index to the front of my PDF as I first intended, and also, the existing index would also still point to the correct pages and not be shifted by one due to my added page.

The command I used with pdftk is as follows:

pdftk A=temp.pdf cat A228 A1-227 \
    output MyBook_Index.pdf

I now have a quick index card at the front of my PDF book that can be used to navigate to the required chapter.