Log in

No account? Create an account

Converting raw CCITT image data?

« previous entry | next entry »
May. 2nd, 2016 | 11:03 am

Does anyone have an idea of how to best convert CCITT image data (extracted from a PDF file) to something usable?

My initial plan was to just extract them, open them in GIMP, and save them as B/W PNGs, but GIMP can't read the raw CCITT data (shame, shame). However, CCITT is one of the compression schemes used in TIFF files, so why not just take the raw data and stuff it into a TIFF container?

The most natural tool for that is likely fax2tiff from libtiff's tools. Unfortunately, this doesn't work, inexplicably: input.ccitt: Not enough memory. Great.

Web-searching then found this blog post talking about the same matters and sharing a snippet of Java code (cue Indiana Jones: why did it have to be Java?) to do the conversion. But somehow code like this:

for(int i=0;i<headervalues.length;i++) bos.write(integer.parseint(headervalues[i],16));="" int="" tagcount="9;" appears="" to="" be="" minimum="" needed="" writeword="" and="" write="" tag="" are="" convenience="" methods="" add="" the="" values="" as="" bytes="" stream="" **="" *="" ifd="" image="" file="" directory="" writeword(string.valueof(tagcount),bos);="" num="" of="" entries="" writetag("256",="" "04",="" "01",="" string.valueof(w),="" bos);="" **width*="" writetag("257",="" string.valueof(h),="" **length*="" **bitspersample="" 258="" -="" b&w="" 1="" bit="" image*="" writetag("258",="" "03",="" "00010000h",="" if="" (k="=" 0){="" writetag("259",="" "00030000h",="" compression="" }else=""> 0)

does not inspire confidence. (In fact I dare say that's badly-formatted rubbish, and I have no real desire to untangle it.)

But the basic idea is sound, and I have a copy of the Encyclopedia of Graphics File Formats (the first edition) lying around, so why not just cast that crufty Java code aside and write my own? Not a bad idea, but TIFF is a baroque and complex format, and you'd still need to look at the PDF various image parameters. It'd be possible, but it'd be also be more work than I'm willing to do.

There's other options still. Although GIMP can't read raw CCITT data, it can import PDFs; unfortunately even without antialiasing you end up with a grayscale image that doesn't match the original B/W image. Why? I wish I knew.

The last thing I can think of would be screenshotting the (correctly-rendered) PDF and working with the screenshot in GIMP. Doable, but it feels like a rather impure solution. So, any ideas?

Link | Leave a comment |

Comments {7}

The Mystery of the Supranational Rabbit

(no subject)

from: porsupah
date: May. 2nd, 2016 02:38 pm (UTC)

I'd probably turn to ImageMagick, which seems able to cope with nearly as many formats as GraphicConverter (brilliant Mac-only app). I'd be somewhat surprised if you couldn't give it all the metadata and let it chow down on the file. ^_^

Reply | Thread


(no subject)

from: schnee
date: May. 2nd, 2016 02:53 pm (UTC)

Good idea! I've tried that, actually — but convert(1) said it couldn't read the file. :/

If this had been on DOS 20 or 25 years ago I'd have tried Image Alchemy, which was then able to cope brilliantly with a lot of obscure formats; maybe that would've done the trick. I think the software's still being sold (presumably there's still legacy users out there), but I'm not aware of there being a free/shareware version anymore, as there was back in the day.

Shame, really. It even handled JPEGs that used arithmetic coding, which is something I've not seen since. (Some patent troll claimed patents related to this, and as a result hardly anyboy implemented it.)

EDIT: the site selling it is down; archive.org last saw it in September last year.

Edited at 2016-05-02 03:43 pm (UTC)

Reply | Parent | Thread