Comments (21)
You are right, there is an issue here. My apologies. I think the fix is pretty simple (no extra storage), just change:
if (!s_alloc_batch_sz[0])
to
if (!s_alloc_batch_sz[SPP_GROUP_SIZE - 1])
If you confirm the fix, I can push it.
Thank you for the report, much appreciated!
from sparsepp.
Actually I felt pretty confident about the fix, so I just pushed it.
from sparsepp.
I fail to see how that would fix the problem. The static storage will still be written and read from multiple threads concurrently, right?
This is the code I'm using to test it:
#include <spp.h>
#include <future>
static int run() {
spp::sparse_hash_map<unsigned, unsigned> m;
m[0] = 0;
return 1;
}
int main()
{
auto t1 = std::async(std::launch::async, &run);
auto t2 = std::async(std::launch::async, &run);
t1.get();
t2.get();
return 1;
}
If you compile this code, and run it with helgrind: valgrind --tool=helgrind ./main
You get the following output:
==80311== Possible data race during read of size 1 at 0x6091DF by thread #3
==80311== Locks held: none
==80311== at 0x402FFD: run() (spp.h:1088)
==80311== by 0x4035CC: std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<int>, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple<int (*())()>, int> >::_M_invoke(std::_Any_data const&) (functional:1391)
==80311== by 0x403D1F: std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (functional:2127)
==80311== by 0x403D3D: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (future:533)
==80311== by 0x4033C2: _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv (functional:227)
==80311== by 0x3522A0CE02: pthread_once (in /lib64/libpthread-2.12.so)
==80311== by 0x403A6A: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ (gthr-default.h:699)
==80311== by 0x405064: _ZNSt6thread11_State_implISt12_Bind_simpleIFZNSt13__future_base17_Async_state_implIS1_IFPFivEvEEiEC4EOS7_EUlvE_vEEE6_M_runEv (future:393)
==80311== by 0x4D00B4E: execute_native_thread_routine (thread.cc:83)
==80311== by 0x4A0C940: mythread_wrapper (hg_intercepts.c:233)
==80311== by 0x3522A07AA0: start_thread (in /lib64/libpthread-2.12.so)
==80311== by 0x3521EE8AAC: clone (in /lib64/libc-2.12.so)
==80311== This conflicts with a previous write of size 1 by thread #2
==80311== Locks held: none
==80311== at 0x403023: run() (spp.h:1103)
==80311== by 0x4035CC: std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<int>, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple<int (*())()>, int> >::_M_invoke(std::_Any_data const&) (functional:1391)
==80311== by 0x403D1F: std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (functional:2127)
==80311== by 0x403D3D: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (future:533)
==80311== by 0x4033C2: _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv (functional:227)
==80311== by 0x3522A0CE02: pthread_once (in /lib64/libpthread-2.12.so)
==80311== by 0x403A6A: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ (gthr-default.h:699)
==80311== by 0x405064: _ZNSt6thread11_State_implISt12_Bind_simpleIFZNSt13__future_base17_Async_state_implIS1_IFPFivEvEEiEC4EOS7_EUlvE_vEEE6_M_runEv (future:393)
==80311==
Where in my code, spp.h:1088 and spp.h:1103 are (with the suggested edit):
1087 static uint8_t s_alloc_batch_sz[SPP_GROUP_SIZE] = { 0 };
1088 if (!s_alloc_batch_sz[SPP_GROUP_SIZE-1])
1089 {
1090 // 32 bit bitmap
1091 // ........ .... .... .. .. .. .. . . . . . . . .
1092 // 8 12 16 18 20 22 24 25 26 ... 32
1093 // ------------------------------------------------------
1094 uint8_t group_sz = SPP_GROUP_SIZE / 4;
1095 uint8_t group_start_alloc = SPP_GROUP_SIZE / 8; //4;
1096 uint8_t alloc_sz = group_start_alloc;
1097 for (int i=0; i<4; ++i)
1098 {
1099 for (int j=0; j<group_sz; ++j)
1100 {
1101 if (j && j % group_start_alloc == 0)
1102 alloc_sz += group_start_alloc;
1103 s_alloc_batch_sz[i * group_sz + j] = alloc_sz;
1104 }
from sparsepp.
The static storage will still be written and read from multiple threads concurrently, right?
right, but they would write the exact same thing, so it shouldn't matter.
I believe that the issue was that the static array could be partially written, when the second thread would test the first element and thought it was ready to use.
By testing the last element, if it is non-zero, that means that the array is fully initialized.
Indeed it is possible that two threads will initialize it at the same time, but I don't think it would cause a problem.
from sparsepp.
Ah, I didn't look carefully at the code. I didn't realize it was just about initialization.
Still, the way the code is now, it's considered UB, so I would be reluctant to use it.
Here is a suggestion of how to do thread-safe static initialization (as of C++11):
struct AllocBatchSize
{
AllocBatchSize()
{
// 32 bit bitmap
// ........ .... .... .. .. .. .. . . . . . . . .
// 8 12 16 18 20 22 24 25 26 ... 32
// ------------------------------------------------------
uint8_t group_sz = SPP_GROUP_SIZE / 4;
uint8_t group_start_alloc = SPP_GROUP_SIZE / 8; //4;
uint8_t alloc_sz = group_start_alloc;
for (int i=0; i<4; ++i)
{
for (int j=0; j<group_sz; ++j)
{
if (j && j % group_start_alloc == 0)
alloc_sz += group_start_alloc;
m_data[i * group_sz + j] = alloc_sz;
}
if (group_start_alloc > 2)
group_start_alloc /= 2;
alloc_sz += group_start_alloc;
}
}
uint8_t m_data[SPP_GROUP_SIZE];
};
static const AllocBatchSize s_alloc_batch_sz;
return n ? static_cast<uint32_t>(s_alloc_batch_sz.m_data[n-1]) : 0; // more aggressive alloc at the beginning
That also makes it a bit clearer that this whole code is just for static initialization.
On the other hand, Helgrind disappointed me as it seems to not understand that this is safe. (http://valgrind.10908.n7.nabble.com/helgrind-and-threadsafe-statics-td49861.html)
from sparsepp.
But I must say, for that to work, there is probably an atomic flag check somewhere in there, for every call, to make sure the code is initialized. Which could be costly...
The best way would be to make this all constexpr, but I'm not sure about how many compilers would support that.
It's possible to metaprogram the way out of this by writing some recursive template that expands the final value of the array in compile time. But that will probably require a bit more time to get right.
from sparsepp.
Thanks for looking into this. I do believe that the code with my change is fine, but your version with the struct is better and clearer. Do you want to submit a pull request?
By the way, I don't know what "is considered UB" mean?
from sparsepp.
from sparsepp.
The solution only works if the compiler writes the code you expect in the order you expect.
The compiler is free to reorder code as long as the observable output is the same.
Though personally I don't believe it would go that far, it's perfectly possible for the compiler to unroll that for loop and reorder statements at will. So, in theory, it could initialize the last position first and walk backwards. If it chooses to do so, a second thread testing the last position could think everything is initialized.
I investigated my solution a bit more, and it turns out, there is no atomic check (at least not in x86_64).
So I will post that as a pull request.
What is really beautiful is that, if you add constexpr
to the constructor and change the standard to C++14, then the whole code is evaluated at compile time as is.
It would be nice to add some macroing to automatically detect C++14 is enabled and add the constexpr there.
I fixed my code to use the less aggressive flag for now, and I'm going to the beach today, but I can take this (and my other request (and I actually have one more)) next week :)
from sparsepp.
Perfect, have fun at the beach, I'll wait for your pull request next week.
Thanks for the explanation of the undefined behavior!
if you add constexpr to the constructor
Please use the macro SPP_CONSTEXPR (defined in spp_util.h) which expands to constexpr on compilers that support it.
from sparsepp.
constexpr keyword was introduced in C++11 but its applicability was expanded in C++14. For this use, it only works with C++14. Do you know if there is a macro to differentiate that?
from sparsepp.
What does it matter? constexpr will be present for the compilers who support the keyword. Whether the code is generated at compile time or when statics are initialized will depend on the compiler, but omitting constexpr when the code is not generated at compile time would not help (or am I missing something?).
from sparsepp.
It's just that on C++11, constexpr
is supported for simple functions while adding it to a class constructor is an error: https://godbolt.org/g/DSPSCC
from sparsepp.
Oh, I see. We could probably define a macro for the various compilers to check that support, but I don't think it is worthwhile in this case. We are talking about just initializing 32 bytes, and it won't make a difference if it is done at compilation time.
from sparsepp.
It could make a difference because if done in compile time, the compiler will likely inline this function. If it's not a hot path or something, it won't matter anyway.
Anyway, let's start simple and leave it as a TODO to makei tbetter
from sparsepp.
Hum, I don't understand why the fact that the static is initialized at compile time or when the program loads would have any effect on inlining. But anyways I'm pretty sure it wouldn't make a difference.
from sparsepp.
Well, when the code is statically initialized, it looks like this:
int foo(int index)
{
if (!initialized)
{
//do a bunch of calculations
//do a bunch of calculations
//do a bunch of calculations
//do a bunch of calculations
}
return the_array[index];
}
So the amount of instructions in that function is probably too large to make it worth inlining.
While, if the values are computed at compile time, the function looks like:
int foo(int index)
{
return the_array[index];
}
That one looks like a good candidate for inlining.
from sparsepp.
Nice!
from sparsepp.
Thanks, I understand now. I didn't know that the compiler generated that inline code in the function to initialize static variables, but now that I think of it of course it makes sense. I also understand why you feel constexpr would be better.
I just added the definition for SPP_CXX14_CONSTEXPR
in spp_util.h, which you should be able to use in the constructor. Let me know if you'd like me to make the change, I don't mind either way.
from sparsepp.
I think I rather do it myself so I can learn the flow for contributing in github :)
from sparsepp.
Sure, thanks again.
from sparsepp.
Related Issues (20)
- Support operator[] with rvalue HOT 1
- When hashing on vector HOT 13
- Help me to serialize/deserialize sparse_hash_map<int , vector< std::string > > HOT 1
- missing return value in function spp_sptr& operator=(spp_sptr &&o) ? HOT 1
- SIGSEGV on using move-assigned-from hash set HOT 3
- Unable to create map for value type with deleted copy constructor HOT 2
- Can't insert std::tuple: spp uses deleted constructor/destructor HOT 1
- Lookup non-present keys becomes very slow. HOT 1
- Do you support C++11 stateful allocators? HOT 1
- Default allocator doesn't throw and caller doesn't check for NULL HOT 1
- SEGV after moving hash_map HOT 2
- conan package HOT 2
- sparse_hash_map<uint32_t, uint64_t> takes more memory than sparse_hash_map<uint64_t, uint64_t> HOT 1
- memcpy can happen with null "source" parameter HOT 6
- class-memaccess warning when compiling with GCC 8 or later HOT 1
- Problem when compiling on macOS catalina HOT 6
- Can sparse_hash_map support unique_ptr?
- Compiler warning on realloc call within sparsepp. HOT 1
- Error when compile: is not a special member function which can be defaulted HOT 2
- Missing return value. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparsepp.