PDF deskew

Advice and Help

Moderator: kcleung

Post Reply
horndude77
active poster
Posts: 293
Joined: Sun Apr 23, 2006 5:08 am
notabot: YES
notabot2: Bot
Location: Phoenix, AZ

PDF deskew

Post by horndude77 »

http://github.com/horndude77/image-scri ... ter/rotate

I've started working on a pdf deskewer (linux only, it might work in cygwin). It is dependent on pagetools (http://pagetools.sourceforge.net/) to find the skew of the page and netpbm (http://netpbm.sourceforge.net/) to read and write pbm files. I wrote my own pbm rotate program because both imagemagick and netpbm rotate tools did strange things to the output. I tested it with the schubert symphony uploaded earlier today (http://imslp.org/wiki/Symphony_No.9,_D. ... rt,_Franz)). Many pages are badly skewed. Here are the first ten pages of the resulting pdf: http://horndude77.googlepages.com/Schub ... 9_part.pdf. Let me know what you think.

P.S. Is the url tag broken or am I doing something wrong?
horndude77
active poster
Posts: 293
Joined: Sun Apr 23, 2006 5:08 am
notabot: YES
notabot2: Bot
Location: Phoenix, AZ

Post by horndude77 »

http://github.com/horndude77/image-scri ... r/pnm_java

I thought the dependencies were a bit difficult for someone to give it a try so I've written it in java with less external dependencies (ant for building, pdfimages for extracting images from pdf, java for running, bash and ruby for some scripts). Also I know how to parallelize java to work across multiple processors so if you have more than one processor it should use them all for image rotation and skew detection. The deskew method is based on the hough transform and it seems to work relatively well though not as fast or as accurate as the pagetools radon transform from what I can tell. I've only tested it on linux, but in this form I think it should be easier to get working on windows or osx. If anyone else finds this of use let me know.

Code: Select all

$ cd pnm_java
$ time ./scripts/deskew_pdf Schubert_Symphony_7__D.729.pdf
real	15m5.383s
user	20m29.213s
sys	1m39.598s
$ #view pdf
$ okular Schubert_Symphony_7__D.729_out.pdf
Carolus
Site Admin
Posts: 2249
Joined: Sun Dec 10, 2006 11:18 pm
notabot: 42
notabot2: Human
Contact:

Re: PDF deskew

Post by Carolus »

That's really quite good. Your example looks better than Acrobat Pro's built-in de-skew feature (part of the Optimize Sacnned PDF menu command).
horndude77
active poster
Posts: 293
Joined: Sun Apr 23, 2006 5:08 am
notabot: YES
notabot2: Bot
Location: Phoenix, AZ

Re: PDF deskew

Post by horndude77 »

http://github.com/horndude77/leptonica- ... s/clean.rb

Thanks! Though I've switched to using leptonica now. It's much faster and gives similar results.
Post Reply