Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - oleg@completedb.com

Pages: [1]
1
The problem is fixed.
Thank you. ;D

2
I will test it over weekends and let you know.

Cheers,
Oleg.

3
Sorry forgot to mention it works on Win32 only fails on x64

4
OS: Windows 8 Pro
Processors: 2 x Intel Xeon E5-2620
Logical Cores: 24
RAM: 64 GB
Microsoft Visual Studio Professional 2012: Version 11.0.60315.01 Update 2
Build: 17.00.60315.1

C++ Command Line:
/Yu"stdafx.h" /GS /GL /W3 /Gy /Zc:wchar_t /I"C:\Program Files (x86)\JustSoftwareSolutions\JustThread\include" /Zi /Gm- /O2 /sdl /Fd"x64\Release\vc110.pdb" /fp:precise /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Fp"x64\Release\FutureWaitFailed.pch"

For some reason I can not attach the file, wanted to send you Visual Studio solution with source files. If you do not mind I will email it. Its small.
If you run it do not forget to change include and lib path to just thread as on my system it maybe in a different place

5
System has 24 logical processors
Compiled under Visual Studio 2012 x64

The same code below runs fine using native Microsoft implementation and fails most of the time when using just::threads
The exact place where it fails is on ftr.wait()
Please advise

   for (unsigned i = 0; i < 10; i++)
   {
      const unsigned THREAD_COUNT = thread::hardware_concurrency() - 1;
      const unsigned SECOND_COUNT = 1;
      volatile bool runThreads = true;
      vector<future<long long>> results;
      for (unsigned u = 0; u < THREAD_COUNT; u++)
      {
         results.emplace_back ( async( [&] ()
         {
            long long counter = 0;
            while (runThreads)
            {
               counter++;
            }
            return counter; 
         } ) );
      }
      this_thread::sleep_for(chrono::seconds(SECOND_COUNT));
      runThreads = false;
      long long total = 0;
      for (auto& ftr : results)
      {
         ftr.wait();
         total += ftr.get();
      }
      cout << total << cout << endl;
   }

6
Thank you for correcting me and clarifying sequential ordering semantics.

7
According to Intel documentation “Loads May Be Reordered with Earlier Stores to Different Locations”
Below is pseudo code
Initial values
std::atomic<int> x = 0;
std::atomic<int> y = 0;
So when using relaxed memory model
//Thread running on Processor 0
x.store(1, std::memory_order_relaxed);
register int ry = y.load(std::memory_order_relaxed);

//Thread running Processor 1
y.store(1, std::memory_order_relaxed);
register int rx = x.load(std::memory_order_relaxed);

//rx == 0 and ry == 0 IS ALLOWED
(copied from intel's doc_ -> At each processor, the load and the store are to different locations and hence may be reordered. Any interleaving
of the operations is thus allowed. One such interleaving has the two loads occurring before the two stores. This
would result in each load returning value 0.

Now when using sequential memory model
// Thread running on Processor 0
x.store(1, std::memory_order_relaxed);
register int ry = y.load(std::memory_order_seq_cst);

// Thread running on Processor 1
y.store(1, std::memory_order_relaxed);
register int rx = x.load(std::memory_order_seq_cst);

//rx == 0 and ry == 0 SHOULD NOT BE ALLOWED BECAUSE load uses memory_order_seq_cst

If my understanding is correct (please correct me if I am wrong), then just::thread load implementation should not use simple mov, but have an appropriate memory barrier to not allow stores to sink below loads.

8
Thanks for quick response

9
Installed recently downloaded justthread_full.msi
Tried using std::thread but getting compiler error

1>C:\Program Files (x86)\JustSoftwareSolutions\JustThread\include\thread(21): fatal error C1083: Cannot open include file: 'jss/forwarding_constructors.hpp': No such file or directory

manually checked jss directory forwarding_constructors.hpp file is not there

Please Advise

10
Thank you.
We should probably warn C++ world :) especially those who is implementing lock-free algos where suboptimal code is not an option.


11
We upgraded to Visual Studio 2012 and decided to look into using std::atomic for inter-thread synchronization
Strangely enough Microsoft implementation is using _InterlockedOr (which internally generates lock cmpxchg DWORD PTR [rcx], edx) for std::atomic<int>::load across all memory models (relaxed, acquire, consume and sequential)
I decided to buy just::thread library and try it instead, just::thread is using a read of volatile variable, which is what we have been using explicitly in code up to now to implement Acquire semantics
Microsoft states explicitly that as long as /volatile::ms compiler option is used (which is by default) one can use volatile objects to be used for memory locks and releases.
http://msdn.microsoft.com/en-us/library/vstudio/12a04hfd.aspx

Would love to know why _InterlockedOr is used which potentially can degrade performance with unnecessary memory barrier for Relaxed!!! and Acquire semantics on Intel-x64.
Am I missing something? Or it is a bug in our code and just::thread std::atomic<int>::load implementation?

Thank you in advance.

Pages: [1]