ImageMagick Fun

February 28th 2010 05:20 pm

The “fun” in the title should be read in your most sarcastic tone of voice… Anyways, one of my professors mailed us a PDF of a scanned document to read (and print out) for the next class. Being that is was scanned in (by what appeared to be the professor literally holding it above a scanner) there was a lot of excess black in the picture.

I don’t know about you, but printing 2 large blocks of solid black, for 22 pages, doesn’t sound like a wise investment of toner. But ah! Why don’t I just crop off the excess part of each page so that just the scanned-in text is visible, and print that out? This has to be easy, right?

Unfortunately it wasn’t as easy as I’d hoped (most of the picture editors that can even handle PDFs can’t print out each layer as a separate page, and there’s no way I’m doing the exact same operation 22 times). ImageMagick looked like the thing I needed, even if it would take some trial-and-error to figure out exactly how much to crop off.

Turned out it only took a couple of runs to figure out exactly how much I could get away with cropping. But I had a worse problem than having to do trial runs: The output looked horrible.

I tried reading the man page, going to the website, and the rest, and couldn’t figure out what to do. Using the -density option seemed to be the right idea, but alas I couldn’t get it to work.

I troubleshot further, even getting to the point of running gs manually to see if Ghostview or ImageMagick was the problem (turned out it was myself, I guess). Eventually I realized that Ghostview was rendering the initial image to ImageMagick at a low resolution (72 DPI) but viewing the source in Okular, it was obvious that much better was possible (I’d estimate 200 DPI although I ended up using 300). So if I could figure out how to get ImageMagick to pass the right DPI to Ghostview I should have the problem fixed.

More directed Google searching revealed I’d had the right flag the whole time, -density. I just had it in the wrong spot. Something like this is right: convert -density 300x300 input.pdf -crop ... output.pdf. Instead I’d been using convert input.pdf -density 300x300 -crop ... output.pdf.

I figured I’d put my experience out there in the great Internet Memory Machine in case others have similar troubles.

Posted by mpyne under Computing Troubles & Useful Tricks | 10 Comments »

10 Responses to “ImageMagick Fun”

  1. Baxeico Identicon Icon Baxeico responded on 28 Feb 2010 at 18:28 #

    I used pdfcrop (http://pdfcrop.sourceforge.net/) in the past for a similar task and it did a great job for me!

  2. mpyne Identicon Icon mpyne responded on 28 Feb 2010 at 18:47 #

    Baxeico: I considered pdfcrop but it seemed that it’s designed more for removing the ridiculous amount of whitespace provided in the LaTeX-generated PDFs from articles and journals. If it had a feature to remove black borders from scanned documents then I must have missed it.

  3. RS Identicon Icon RS responded on 28 Feb 2010 at 19:21 #

    Could you please post the full commands you ran to crop the PDFs? Thank you!

  4. muuloo Identicon Icon muuloo responded on 28 Feb 2010 at 20:55 #

    Another trick would be to use pdfimages (part of poppler-utils) to extract all the embedded images 1:1 (without letting Ghostscript scale them). This will usually give you the best quality possible when you deal with pdf with embedded images.
    The tool is also very useful when you want to get some other images from a pdf ;-)

  5. goffrie Identicon Icon goffrie responded on 28 Feb 2010 at 21:15 #

    You could use unpaper to automatically crop the black borders for you, instead of using ImageMagick. (You still need to get an image as a pnm first, though.)

  6. Patrick Identicon Icon Patrick responded on 01 Mar 2010 at 00:51 #

    I guess I would have used GIMP for cropping (and probably enhancing the contrast, which is necessary on most scanned documents I get), then created a PDF again (following https://patrick-nagel.net/blog/archives/199), which I would then have printed / redistributed.

  7. Links 1/3/2010: New Linux Benchmarks ARM Development Studio for Linux | Boycott Novell responded on 01 Mar 2010 at 20:05 #

    [...] ImageMagick Fun The “fun” in the title should be read in your most sarcastic tone of voice… Anyways, one of my professors mailed us a PDF of a scanned document to read (and print out) for the next class. Being that is was scanned in (by what appeared to be the professor literally holding it above a scanner) there was a lot of excess black in the picture. [...]

  8. twitter Identicon Icon twitter responded on 02 Mar 2010 at 02:09 #

    I second the poppler-utils recommendation. Once you have the images out by pdfimages, you can start to hack away with image magic. If your scanner was good enough to do text conversion, use pdftotext. pdftohtml is also nice.

  9. teebs Identicon Icon teebs responded on 04 Mar 2010 at 13:11 #

    Ive imported documents into Inkscape and simply put white boxes/shapes to cover the area in question and printing. It only imports one page at a time so it may be a touch tedious on larger documents.

  10. Destillat #11 | duetsch.info - Open Source, Wet-, Web-, Software responded on 08 Mar 2010 at 07:18 #

    [...] ImageMagick Fun [...]

Trackback URI | Comments RSS

Leave a Reply