This looks very similar to what I came up for pik, consequently used as main default colorspace of JPEG XL, i.e., XYB.
Also butteraugli's XYB has similar ideas, but is slightly more expensive to calculate due to the biased logarithm in the compression function (instead of cubic root), but possibly scales better for HDR (say above 200 nits).
JPEG XL's XYB includes more red and less green in S-receptor modeling (for the blue-yellow axis). If I look at literature of LMS receptor spectra, it makes me wonder why there is so much green in Oklab. When I optimized similar for XYB, the optimization would favor adding slightly more red for the S than green.
S component in JPEG XL XYB before non-linearity:
0.24 * R + 0.20 * G + 0.56 * B
S component in Oklab before non-linearity:
0.05 * R + 0.26 * G + 0.63 * B
Given the similarity of Oklab and XYB I suspect (but i'm not completely sure) that JPEG XL's format is powerful enough to model Oklab, too. Very very likely it can perfectly model the M1 matrix and the cubic root. I believe for M2 there may be some need for approximations. There JPEG XL can have local variations for M2 from chroma-from-luma fields, but likely luma needs to be slightly different from Oklab.
Another rather substantial difference is in M2 matrix of Oklab. In my experiments I don't see S-participation in colors with high spatial frequency. Because in image compression a lot of information is in high spatial frequency, one gets favorable image compression when M2 matrix is without S contribution in Luma. We use just [b b 0, a -a 0, -0.5 * c, -0.5 * c, c] in the M2 phase in JPEG XL. The two 0s there don't bring S reception into Luma and redness-greenness observations.
This difference can be because Oklab is based on XYZ which is based on 2 degree color samples. XYB is based on about 0.03 degree color samples. Perception seems to be different there -- to me it looks like S is not yet integrated into Luma experience at that resolution.
In butteraugli color modeling is more complex: it is divided into high spatial frequency and low spatial frequency. S is brought only to the low spatial frequency color transforms. (Frequency separation there is by Laplacian pyramid.)
One more interesting and substantial difference between Oklab and XYB is that XYB includes biases before the nonlinearity, i.e., one can consider the M1 matrix as a homogeneous matrix. These biases make the receptive model (more) linear close to zero, and the non-linearity ramps up when one goes further into high intensity. The idea there is to model the spontaneous opsin isomerization in the receptors. I believe sRGB approximated this by gluing a small linear ramp into the nonlinearity with if-then-logic.
Not that familiar with XYB and its properties. Is there anywhere I can read more? Found some specifications, but not anything on its properties.
I think this might be a case where the requirements for image editing and image compression are different.
For image editing, especially when working with HDR images, I think it is better to just have a simple power function, since this makes less assumptions on the exact viewing conditions. E.g a user might want to adjust exposure while editing an image, and if the prediction of hue changes when the exposure is altered, that would be confusing (which happen if more complex non-linearities are used). When compressing final images though, that wouldn’t be an issue in the same way.
Basically the M1 matrix for linear sRGB [linearR, linearG, linearB, 1] to approximate cone responses:
(I think in this normalization 1 means 250 nits, but not completely sure at this stage of optimizations -- we changed normalizations on this recently.)
The LMS values after cubic root are coded by this matrix M2:
M2 = [[1, -1, 0], [1, 1, 0], [0, 0, 1]]
In practice Y->X and Y->B correlations are decorrelated, so M2 looks more like this:
M2 = [[1+a, -1+a, 0], [1, 1, 0], [b, b, 1]]
after decorrelations a is often around zero and b is around -0.5.
The first dimension in this formulation is X (red-green), second Y (luma), third B (blueness-yellowness).
For quantization, X, Y and B channels are multiplied by constants representing their psychovisual strength. X and B channels (the chromacity channels) are less important when quantization is low, and particularly X channel increases in strength when more quantization is done.
Cube is beautiful in the sense that it allows scaling the intensity without considerations, but it is quite awful in the near black psychovisual performance. That is why sRGB added a linear ramp, and I added biasing (homogeneous transform instead of 3x3).
Regarding: “Cube is beautiful in the sense that it allows scaling the intensity without considerations, but it is quite awful in the near black psychovisual performance.”
Yeah, that is the tradeoff, same for dealing with hdr values. The idea with Oklab is to avoid having to know what luminance the eye is adapted to, by treating all colors as if they are within the normal color vision range basically. Makes it simpler to use and more predictable to use, but makes predictions in the extreme ends worse than it would be taking knowledge of the viewing conditions into account (given that you can do so accurately)
E.g. linear ramp for near black values would not be good if you are in a dark room, only viewing very dark values full screen on a monitor (so there isn’t anything bright around to adapt to)
BTW, just if it didn't become clear from all the proposals I had: I adore your work with Oklab. Humanity would benefit a lot if more scientists and engineers were able to think like you -- from first principles and completely out-of-the-cargo-cult-box. What you propose with Oklab is practical and a huge improvement over the current practice.
I would consider just continuing to use the CIELAB adjusted cube root function, with a linear part near zero. It has been used widely for 45 years and people understand it pretty well. It is plenty fast to implement (just takes one extra conditional move or the like and one FMA).
We don't need to use something just because it is old. CIELAB is based on 2 degree samples of color. Colors work differently at smaller angles due to the different densities of receptors, particularly the larger size and lower density of S receptors. Pixels on most recent monitors are about 0.02 degrees, 100x smaller in angle, 10'000x smaller in area than what the old color research is based on.
Ops. I believe I confused linear XYZ that enters the Oklab M1 matrix with the linear sRGB that enters the JPEG XL XYB's M1 matrix. When one converts them into the same space first, XYB and Oklab are likely even more similar with each other.
Also butteraugli's XYB has similar ideas, but is slightly more expensive to calculate due to the biased logarithm in the compression function (instead of cubic root), but possibly scales better for HDR (say above 200 nits).
JPEG XL's XYB includes more red and less green in S-receptor modeling (for the blue-yellow axis). If I look at literature of LMS receptor spectra, it makes me wonder why there is so much green in Oklab. When I optimized similar for XYB, the optimization would favor adding slightly more red for the S than green.
S component in JPEG XL XYB before non-linearity:
0.24 * R + 0.20 * G + 0.56 * B
S component in Oklab before non-linearity:
0.05 * R + 0.26 * G + 0.63 * B
Given the similarity of Oklab and XYB I suspect (but i'm not completely sure) that JPEG XL's format is powerful enough to model Oklab, too. Very very likely it can perfectly model the M1 matrix and the cubic root. I believe for M2 there may be some need for approximations. There JPEG XL can have local variations for M2 from chroma-from-luma fields, but likely luma needs to be slightly different from Oklab.