In each library, their operations need to be implemented for each backend separately (CPU, GPU, TPU, etc). All of these libraries support CUDA as their default GPU implementation because it’s by far the largest in terms of market share. But since Apple GPUs do not implement CUDA or a translation layer (they use Metal, Apples graphics and compute acceleration library), that means all those mathematical operations need to be rewritten targeting Metal before torch can even communicate usefully with an M1/M2 GPU. That doesn’t even touch on the fact that different backends need to have work scheduled on them slightly differently. Some expect a graph of operations to be submitted, some are just submitted as a series of asynchronous operations, etc.
Also just wanted to point out that torch does support Apple GPUs now, however.
Also just wanted to point out that torch does support Apple GPUs now, however.