Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you don't have a fundamentally serial workload (and usually either you don't or you have a lot of them you can parallelize across tasks) and you are willing to write bespoke CUDA code for that workload, Nvidia is telling the truth.

CUDA's sweet spot lies between embarrassingly parallel (for which ASICs and FPGAs rule the world because these are generally pure compute with low memory bandwidth overhead) and serial (for which CPUs are still best), a place I call "annoyingly parallel." There are a lot of workloads in this space in my experience.

But if you don't satisfy both of the aforementioned requirements and/or you insist on doing this all from someone else's code interfaced through a weak-typed garbage collected global interpreter locked language, your mileage will vary greatly cough deep learning frameworks cough.

Finally, it doesn't matter who's doing it, marchitecturing(tm) drives me nuts too.



They're not lying. To parallelize, you have to code in a different style - though it doesn't have to be CUDA. However, it's easier to enforce the syle in a parallel-specific language, and it can help to support idioms.

Controlling the language certainly helps Nvidia's economic moat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: