Yes...that's the problem. A problem that could be easily avoided by asking exist...

jononor · on June 21, 2024

Most ML engineers know that many want more fine grained control. But the straight forward way to train such models is incredibly data demanding. The datasets used for whole image generation consist of several billion images. I do not think anyone has compiled any DAW project / stems projects that are anywhere close to this size. So that is a limiting factor right now. But we will find ways to get there, probably a lot of progress over the next 5 years. Maybe even the next 2.