It's also complicated by the notion that raster performance doesn't directly translate to tensor performance. Apple and AMD both make excellent raster GPUs, but still lose in efficiency to the CUDA's architecture in rendering and compute.
I'd really like AMD and Apple to start from scratch with a compute-oriented GPU architecture, ideally standardized with Khronos. The NPU/tensor coprocessor architecture has already proven itself to be a bad idea.
That may be true, but assuming you meant "within 30% of the performance" ... can we just acknowledge that is a rather significant handicap, even ignoring CUDA.
(Replaced "with 30%" with "within 30%")