> Kaitai Struct is in a similar space, generating safe parsers for multiple target programming languages from one declarative specification. Again, Wuffs differs in that it is a complete (and performant) end to end implementation, not just for the structured parts of a file format. Repeating a point in the previous paragraph, the difficulty in decoding the GIF format isn't in the regularly-expressible part of the format, it's in the LZW compression. Kaitai's GIF parser returns the compressed LZW data as an opaque blob.
Taking PNG as an example, Kaitai will tell you the image's metadata (including width and height) and that the compressed pixels are in the such-and-such part of the file. But unlike Wuffs, Kaitai doesn't actually decode the compressed pixels.
---
Wuffs' generated C code also doesn't need any capabilities, including the ability to malloc or free. Its example/mzcat program (equivalent to /bin/bzcat or /bin/zcat, for decoding BZIP2 or GZIP) self-imposes a SECCOMP_MODE_STRICT sandbox, which is so restrictive (and secure!) that it prohibits any syscalls other than read, write, _exit and sigreturn.
I like my colleague Simon Morris's observation about software complexity:
> Software has a Peter Principle. If a piece of code is comprehensible, someone will extend it, so they can apply it to their own problem. If it’s incomprehensible, they’ll write their own code instead. Code tends to be extended to its level of incomprehensibility.
Not true. Even just within libjpeg, there are three different IDCT implementations (jidctflt.c, jidctfst.c, jidctint.c) and they produce different pixels (it's a classic speed vs quality trade-off). It's spec-compliant to choose any of those.
A few years ago, in libjpeg-turbo, they changed the smoothing kernel used for decoding (incomplete) progressive JPEGs, from a 3x3 window to 5x5. This meant the decoder produced different pixels, but again, that's still valid:
Moritz, the author of that improvement, implemented the same for jpegli.
I believe the standard does not specify what the intermediate progressive renderings should look like.
I developed that interpolation mechanism originally for Pik, and Moritz was able to formulate it directly in the DCT space so that we don't need to go into pixels for the smoothing to happen, but he computed it using a few of the low frequency DCT coefficients.
> I believe the standard does not specify what the intermediate progressive renderings should look like.
This is possibly getting too academic, but IIUC for a progressive JPEG, e.g. encoded by cjpeg to have 10 0xDA Start Of Scan markers, it's actually legitimate to post-process the file, truncating to fewer scans (but re-appending the 0xD9 End Of Image marker). The shorter file is still a valid JPEG, and so still relevant for discussing whether all decoders will render the same pixels.
I might be wrong about validity, though. It's been a while since I've studied the JPEG spec.
I was not aware of that; I thought that it was pretty deterministic.
Nonetheless, for this particular case, comparing jpegs decoded into lossless formats is unnecessary -- you can simply compare the two jpegs directly based on the default renderer in your browser.
And nowadays, for subsampled images libjpeg post classic version 6 insists on doing the chroma upscaling using DCT where possible, so for classic 4:2:0 subsampled images (i.e. chroma resolution is half the luma resolution both horizontally and vertically) each subsampled 8x8 chroma block is now upscaled individually to 16x16 for the final image, which can and does introduce additional artefacts at the boundaries between each 16x16 px block in the final image. But the current libjpeg maintainer insists on that new algorithm because it is mathematically more beautiful…
Granted, the introduced artefacts aren't massive, but under certain circumstances they are noticeable, which is how I stumbled across that topic in the first place.
Thankfully, most software that isn't still stuck on libjpeg 6 has switched to libjpeg-turbo or some other library instead which continues using a more sensible algorithm for chroma upscaling.
https://github.com/rsc/fpfmt/blob/main/bench/uscalec/ftoa.c