I think AI is currently much poorer for this use case, if you want to generalize it. There is less assembly code training data available where existing bad coding patterns are matched to actual bug descriptions. Assembly is more verbose so they also take more context width from LLMs. False positive are the biggest pain in this area. With LLMs it is also surprisingly difficult to test the existence of vulnerability in general - often you give a hint about the possible issue with the prompt itself. If you do it in large scale, false positives are everywhere.