The right answer is probably use atomics but that has a performance cost.
I am still trying to find out if atomics suffer from contendedness.
My perfect scenario is the atomic increment should be as cheap as the non atomic increment.
But I think the bus pausing and cache coherence protocols mean that the data is flushed from the store buffer to the memory, which is slow. I don't know if it acts as a lock and has an uncontended option
The right answer is probably use atomics but that has a performance cost.
I am still trying to find out if atomics suffer from contendedness.
My perfect scenario is the atomic increment should be as cheap as the non atomic increment.
But I think the bus pausing and cache coherence protocols mean that the data is flushed from the store buffer to the memory, which is slow. I don't know if it acts as a lock and has an uncontended option