Page 1 of 1

PDF deskew

Posted: Mon Nov 24, 2008 1:36 am
by horndude77
http://github.com/horndude77/image-scri ... ter/rotate

I've started working on a pdf deskewer (linux only, it might work in cygwin). It is dependent on pagetools (http://pagetools.sourceforge.net/) to find the skew of the page and netpbm (http://netpbm.sourceforge.net/) to read and write pbm files. I wrote my own pbm rotate program because both imagemagick and netpbm rotate tools did strange things to the output. I tested it with the schubert symphony uploaded earlier today (http://imslp.org/wiki/Symphony_No.9,_D. ... rt,_Franz)). Many pages are badly skewed. Here are the first ten pages of the resulting pdf: http://horndude77.googlepages.com/Schub ... 9_part.pdf. Let me know what you think.

P.S. Is the url tag broken or am I doing something wrong?

Posted: Mon Dec 01, 2008 5:21 am
by horndude77
http://github.com/horndude77/image-scri ... r/pnm_java

I thought the dependencies were a bit difficult for someone to give it a try so I've written it in java with less external dependencies (ant for building, pdfimages for extracting images from pdf, java for running, bash and ruby for some scripts). Also I know how to parallelize java to work across multiple processors so if you have more than one processor it should use them all for image rotation and skew detection. The deskew method is based on the hough transform and it seems to work relatively well though not as fast or as accurate as the pagetools radon transform from what I can tell. I've only tested it on linux, but in this form I think it should be easier to get working on windows or osx. If anyone else finds this of use let me know.

Code: Select all

$ cd pnm_java
$ time ./scripts/deskew_pdf Schubert_Symphony_7__D.729.pdf
real	15m5.383s
user	20m29.213s
sys	1m39.598s
$ #view pdf
$ okular Schubert_Symphony_7__D.729_out.pdf

Re: PDF deskew

Posted: Sun Jun 14, 2009 7:31 pm
by Carolus
That's really quite good. Your example looks better than Acrobat Pro's built-in de-skew feature (part of the Optimize Sacnned PDF menu command).

Re: PDF deskew

Posted: Sun Jun 14, 2009 9:11 pm
by horndude77
http://github.com/horndude77/leptonica- ... s/clean.rb

Thanks! Though I've switched to using leptonica now. It's much faster and gives similar results.