Schnee (schnee) wrote,

Converting raw CCITT image data?

Does anyone have an idea of how to best convert CCITT image data (extracted from a PDF file) to something usable?

My initial plan was to just extract them, open them in GIMP, and save them as B/W PNGs, but GIMP can't read the raw CCITT data (shame, shame). However, CCITT is one of the compression schemes used in TIFF files, so why not just take the raw data and stuff it into a TIFF container?

The most natural tool for that is likely fax2tiff from libtiff's tools. Unfortunately, this doesn't work, inexplicably: input.ccitt: Not enough memory. Great.

Web-searching then found this blog post talking about the same matters and sharing a snippet of Java code (cue Indiana Jones: why did it have to be Java?) to do the conversion. But somehow code like this:

for(int i=0;i<headervalues.length;i++) bos.write(integer.parseint(headervalues[i],16));="" int="" tagcount="9;" appears="" to="" be="" minimum="" needed="" writeword="" and="" write="" tag="" are="" convenience="" methods="" add="" the="" values="" as="" bytes="" stream="" **="" *="" ifd="" image="" file="" directory="" writeword(string.valueof(tagcount),bos);="" num="" of="" entries="" writetag("256",="" "04",="" "01",="" string.valueof(w),="" bos);="" **width*="" writetag("257",="" string.valueof(h),="" **length*="" **bitspersample="" 258="" -="" b&w="" 1="" bit="" image*="" writetag("258",="" "03",="" "00010000h",="" if="" (k="=" 0){="" writetag("259",="" "00030000h",="" compression="" }else=""> 0)

does not inspire confidence. (In fact I dare say that's badly-formatted rubbish, and I have no real desire to untangle it.)

But the basic idea is sound, and I have a copy of the Encyclopedia of Graphics File Formats (the first edition) lying around, so why not just cast that crufty Java code aside and write my own? Not a bad idea, but TIFF is a baroque and complex format, and you'd still need to look at the PDF various image parameters. It'd be possible, but it'd be also be more work than I'm willing to do.

There's other options still. Although GIMP can't read raw CCITT data, it can import PDFs; unfortunately even without antialiasing you end up with a grayscale image that doesn't match the original B/W image. Why? I wish I knew.

The last thing I can think of would be screenshotting the (correctly-rendered) PDF and working with the screenshot in GIMP. Doable, but it feels like a rather impure solution. So, any ideas?

Tags: help needed, image editing, image processing, images, software

Posts from This Journal “help needed” Tag

  • Looking for a new phone

    I might be in the market for a new mobile phone. The reason's simple: people I know in real life are starting up a WhatsApp group to keep in…

  • Firefox 29

    OK, so Firefox (with its long-dreaded new UI, Australis) recently came out. And as usual, alongside all the new features and "features", it…

  • Michael Morones

    You may well have heard the news already: a young boy named Michael Morones tried to take his own life a couple of days ago due to bullying he had to…

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded