Author Topic: std::atomic<int>::load implementation on Intel-x64  (Read 35449 times)

oleg@completedb.com

  • Newbie
  • *
  • Posts: 11
    • View Profile
    • CompleteDB
std::atomic<int>::load implementation on Intel-x64
« on: April 29, 2013, 06:22:54 PM »
We upgraded to Visual Studio 2012 and decided to look into using std::atomic for inter-thread synchronization
Strangely enough Microsoft implementation is using _InterlockedOr (which internally generates lock cmpxchg DWORD PTR [rcx], edx) for std::atomic<int>::load across all memory models (relaxed, acquire, consume and sequential)
I decided to buy just::thread library and try it instead, just::thread is using a read of volatile variable, which is what we have been using explicitly in code up to now to implement Acquire semantics
Microsoft states explicitly that as long as /volatile::ms compiler option is used (which is by default) one can use volatile objects to be used for memory locks and releases.
http://msdn.microsoft.com/en-us/library/vstudio/12a04hfd.aspx

Would love to know why _InterlockedOr is used which potentially can degrade performance with unnecessary memory barrier for Relaxed!!! and Acquire semantics on Intel-x64.
Am I missing something? Or it is a bug in our code and just::thread std::atomic<int>::load implementation?

Thank you in advance.

Anthony Williams

  • Administrator
  • Full Member
  • *****
  • Posts: 103
    • View Profile
    • just::thread C++ Thread Library
Re: std::atomic<int>::load implementation on Intel-x64
« Reply #1 on: April 29, 2013, 06:43:02 PM »
I believe that the VS2012 implementation is being overly conservative. If the appropriate synchronization is used on the store, then an atomic<int>::load need only be a MOV on x86.

Microsoft is trying to phase out the use of volatile for synchronization, and is encouraging people to use std::atomic instead, hence the compiler switch. It is a shame if their library generates suboptimal code in this case.

Just::Thread does not rely on the special semantics of volatile; where our source code uses volatile it is just to force the compiler to issue the load --- the _ReadWriteBarrier() intrinsic is used to restrict reordering by the compiler.

oleg@completedb.com

  • Newbie
  • *
  • Posts: 11
    • View Profile
    • CompleteDB
Re: std::atomic<int>::load implementation on Intel-x64
« Reply #2 on: April 29, 2013, 06:53:24 PM »
Thank you.
We should probably warn C++ world :) especially those who is implementing lock-free algos where suboptimal code is not an option.