Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can use non-temporal writes to avoid this, and some CPUs have an instruction that zeroes a cache line. It's not expensive to do this.


Nontemporal writes are substantially slower, e.g. with avx512 you can do 1 64 byte nontemporal write every 5 or so clock cycles. That puts you at >= 640 cycles for 8 KiB. https://uops.info/html-instr/VMOVNTPS_M512_ZMM.html


Well, the point of a non-temporal write kind of is that you don't care how fast it is. (Since if it was being read again anytime soon, you'd want it in the cache.)

But yes, it can be an over-optimization.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: