?

Log in

No account? Create an account

Converting raw CCITT image data?

« previous entry | next entry »
May. 2nd, 2016 | 11:03 am

Does anyone have an idea of how to best convert CCITT image data (extracted from a PDF file) to something usable?

My initial plan was to just extract them, open them in GIMP, and save them as B/W PNGs, but GIMP can't read the raw CCITT data (shame, shame). However, CCITT is one of the compression schemes used in TIFF files, so why not just take the raw data and stuff it into a TIFF container?

The most natural tool for that is likely fax2tiff from libtiff's tools. Unfortunately, this doesn't work, inexplicably: input.ccitt: Not enough memory. Great.

Web-searching then found this blog post talking about the same matters and sharing a snippet of Java code (cue Indiana Jones: why did it have to be Java?) to do the conversion. But somehow code like this:

for(int i=0;i<headervalues.length;i++) bos.write(integer.parseint(headervalues[i],16));="" int="" tagcount="9;" appears="" to="" be="" minimum="" needed="" writeword="" and="" write="" tag="" are="" convenience="" methods="" add="" the="" values="" as="" bytes="" stream="" **="" *="" ifd="" image="" file="" directory="" writeword(string.valueof(tagcount),bos);="" num="" of="" entries="" writetag("256",="" "04",="" "01",="" string.valueof(w),="" bos);="" **width*="" writetag("257",="" string.valueof(h),="" **length*="" **bitspersample="" 258="" -="" b&w="" 1="" bit="" image*="" writetag("258",="" "03",="" "00010000h",="" if="" (k="=" 0){="" writetag("259",="" "00030000h",="" compression="" }else=""> 0)

does not inspire confidence. (In fact I dare say that's badly-formatted rubbish, and I have no real desire to untangle it.)

But the basic idea is sound, and I have a copy of the Encyclopedia of Graphics File Formats (the first edition) lying around, so why not just cast that crufty Java code aside and write my own? Not a bad idea, but TIFF is a baroque and complex format, and you'd still need to look at the PDF various image parameters. It'd be possible, but it'd be also be more work than I'm willing to do.

There's other options still. Although GIMP can't read raw CCITT data, it can import PDFs; unfortunately even without antialiasing you end up with a grayscale image that doesn't match the original B/W image. Why? I wish I knew.

The last thing I can think of would be screenshotting the (correctly-rendered) PDF and working with the screenshot in GIMP. Doable, but it feels like a rather impure solution. So, any ideas?

Link | Leave a comment | Share

Comments {7}

allaboutweather

(no subject)

from: allaboutweather
date: May. 2nd, 2016 12:05 pm (UTC)
Link

Java..... i hate Java. I took a Java class this semester and no matter how hard i tried, i couldn't understand it at all. I just couldn't fix any errors whatsoever as we were never shown how. :(

I'm more used to python. I wish I could help you with this. :(

Reply | Thread

Schneelocke

(no subject)

from: schnee
date: May. 2nd, 2016 02:59 pm (UTC)
Link

Ah, what was it you didn't understand about it? Are you familiar with object-oriented programming languages in principle — C++, say, or Smalltalk-80, or others? If you're coming from an imperative background then OO programming will feel strange and unnatural at first.

Nonwithstanding my above remark Java's an OK language all in all. It's got its fair share of issues – some of which are inherent in the language proper, some of which are with the standard library, some of which are with its development culture and community –, but when it was created it was a huge step forward compared to C++. Mind, C++ has evolved since then as well — and Java in turn has been superseded by better languages that dealt with its problems, introduced (or uncovered) new ones in turn, and were themselves superseded.

That said for this particular task I think Java is a poor choice; in fact any OO language would've been. I'm fairly agnostic when it comes to fundamental language paradigms (about the only thing I dislike is the smug superiority of some functional languages, or rather their designers/users), but for quick hacks like this I think imperative languages are best and most natural, and then you'll likely want something like Perl, or Ruby, or even Python. (I don't like Python a lot, all in all, but I'd have chosen it over Java.)

Reply | Parent | Thread

allaboutweather

(no subject)

from: allaboutweather
date: May. 2nd, 2016 03:31 pm (UTC)
Link

I think it was because i was so used to python as i use it for GIS.

I only started taking programming classes this semester, which is my last semester of college and finals are this week. In other words, i graduate on saturday. :)

Reply | Parent | Thread

Schneelocke

(no subject)

from: schnee
date: May. 2nd, 2016 03:39 pm (UTC)
Link

Ah, cool. Congrats!

Reply | Parent | Thread

allaboutweather

(no subject)

from: allaboutweather
date: May. 2nd, 2016 03:58 pm (UTC)
Link

Thanks! :)

Reply | Parent | Thread

The Mystery of the Supranational Rabbit

(no subject)

from: porsupah
date: May. 2nd, 2016 02:38 pm (UTC)
Link

I'd probably turn to ImageMagick, which seems able to cope with nearly as many formats as GraphicConverter (brilliant Mac-only app). I'd be somewhat surprised if you couldn't give it all the metadata and let it chow down on the file. ^_^

Reply | Thread

Schneelocke

(no subject)

from: schnee
date: May. 2nd, 2016 02:53 pm (UTC)
Link

Good idea! I've tried that, actually — but convert(1) said it couldn't read the file. :/

If this had been on DOS 20 or 25 years ago I'd have tried Image Alchemy, which was then able to cope brilliantly with a lot of obscure formats; maybe that would've done the trick. I think the software's still being sold (presumably there's still legacy users out there), but I'm not aware of there being a free/shareware version anymore, as there was back in the day.

Shame, really. It even handled JPEGs that used arithmetic coding, which is something I've not seen since. (Some patent troll claimed patents related to this, and as a result hardly anyboy implemented it.)

EDIT: the site selling it is down; archive.org last saw it in September last year.

Edited at 2016-05-02 03:43 pm (UTC)

Reply | Parent | Thread