You've rediscovered a state-of-the-art technique, currently used by JPEG XL, AV1...

You've rediscovered a state-of-the-art technique, currently used by JPEG XL, AV1, and the HEVC range extensions. It's called "chroma from luma" or "cross-component prediction".

This technique has a weakness: the most interesting and high-entropy data shared between the luma and chroma planes is their edge geometry. To suppress block artefacts near edges, you need to code an approximation of the edge contours. This is the purpose of your quadtree structure.

In a codec which compresses both luma and chroma, you can re-use the luma quadtree as a chroma quadtree, but the quadtree itself is not the main cost here. For each block touched by a particular edge, you're redundantly coding that edge's chroma slope value, `(chroma_inside - chroma_outside) / (luma_inside - luma_outside)`. Small blocks can tolerate a lower-precision slope, but it's a general rule that coding many imprecise values is more expensive than coding a few precise values, so this strategy costs a lot of bits.

JPEG XL compensates for this problem by representing the local chroma-from-luma slope as a low-resolution 2D image, which is then recursively compressed as a lossless JPEG XL image. This is similar to your idea of using PNG-like compression (delta prediction, followed by DEFLATE).

Of course, since you're capable of rediscovering the state of the art, you're also capable of improving on it :-)

One idea would be to write a function which, given a block of luma pixels, can detect when the block contains two discrete luma shades (e.g. "30% of these pixels have a luminance value close to 0.8, 65% have a luminance value close to 0.5, and the remaining 5% seem to be anti-aliased edge pixels"). If you run an identical shade-detection algorithm in both the encoder and decoder, you can then code chroma information separately for each side of the edge. Because this would reduce edge artefacts, it might enable you to make your quadtree leaf nodes much larger, reducing your overall data rate.