Code Monkey home page Code Monkey logo

Comments (21)

greg7mdp avatar greg7mdp commented on July 23, 2024

You are right, there is an issue here. My apologies. I think the fix is pretty simple (no extra storage), just change:

if (!s_alloc_batch_sz[0])
to
if (!s_alloc_batch_sz[SPP_GROUP_SIZE - 1])

If you confirm the fix, I can push it.

Thank you for the report, much appreciated!

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

Actually I felt pretty confident about the fix, so I just pushed it.

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

I fail to see how that would fix the problem. The static storage will still be written and read from multiple threads concurrently, right?

This is the code I'm using to test it:

#include <spp.h>                                                                              
#include <future>                                                                                                       
                                                                                                                                                                                 
static int run() {                                                                                                         
    spp::sparse_hash_map<unsigned, unsigned> m;                                                                            
    m[0] = 0;                                                                                                              
    return 1;                                                                                                              
}                                                                                                                          
                                                                                                                           
int main()                                                                                                              
{                                                                                                                       
    auto t1 = std::async(std::launch::async, &run);                                                                     
    auto t2 = std::async(std::launch::async, &run);                                                                     
                                                                                                                        
    t1.get();                                                                                                           
    t2.get();                                                                                                           
                                                                                                                        
    return 1;                                                                                                           
}              

If you compile this code, and run it with helgrind: valgrind --tool=helgrind ./main

You get the following output:


==80311== Possible data race during read of size 1 at 0x6091DF by thread #3                                             
==80311== Locks held: none                                                                                              
==80311==    at 0x402FFD: run() (spp.h:1088)                                                                            
==80311==    by 0x4035CC: std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<int>, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple<int (*())()>, int> >::_M_invoke(std::_Any_data const&) (functional:1391)                                                                                                                                                                      
==80311==    by 0x403D1F: std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (functional:2127)
==80311==    by 0x403D3D: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (future:533)
==80311==    by 0x4033C2: _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv (functional:227)
==80311==    by 0x3522A0CE02: pthread_once (in /lib64/libpthread-2.12.so)                                               
==80311==    by 0x403A6A: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ (gthr-default.h:699)
==80311==    by 0x405064: _ZNSt6thread11_State_implISt12_Bind_simpleIFZNSt13__future_base17_Async_state_implIS1_IFPFivEvEEiEC4EOS7_EUlvE_vEEE6_M_runEv (future:393)
==80311==    by 0x4D00B4E: execute_native_thread_routine (thread.cc:83)                                                 
==80311==    by 0x4A0C940: mythread_wrapper (hg_intercepts.c:233)                                                       
==80311==    by 0x3522A07AA0: start_thread (in /lib64/libpthread-2.12.so)                                               
==80311==    by 0x3521EE8AAC: clone (in /lib64/libc-2.12.so)          


==80311== This conflicts with a previous write of size 1 by thread #2                                                   
==80311== Locks held: none                                                                                              
==80311==    at 0x403023: run() (spp.h:1103)                                                                            
==80311==    by 0x4035CC: std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<int>, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple<int (*())()>, int> >::_M_invoke(std::_Any_data const&) (functional:1391)
==80311==    by 0x403D1F: std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (functional:2127)
==80311==    by 0x403D3D: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (future:533)
==80311==    by 0x4033C2: _ZZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv (functional:227)
==80311==    by 0x3522A0CE02: pthread_once (in /lib64/libpthread-2.12.so)                                               
==80311==    by 0x403A6A: _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ (gthr-default.h:699)
==80311==    by 0x405064: _ZNSt6thread11_State_implISt12_Bind_simpleIFZNSt13__future_base17_Async_state_implIS1_IFPFivEvEEiEC4EOS7_EUlvE_vEEE6_M_runEv (future:393)
==80311==                                                        

Where in my code, spp.h:1088 and spp.h:1103 are (with the suggested edit):

1087         static uint8_t s_alloc_batch_sz[SPP_GROUP_SIZE] = { 0 };                                                        
1088         if (!s_alloc_batch_sz[SPP_GROUP_SIZE-1])                                                                        
1089         {                                                                                                                                                                   
1090             // 32 bit bitmap                                                                                            
1091             // ........ .... .... .. .. .. .. .  .  .  .  .  .  .  .                                                    
1092             //     8     12   16  18 20 22 24 25 26   ...          32                                                   
1093             // ------------------------------------------------------                                                   
1094             uint8_t group_sz          = SPP_GROUP_SIZE / 4;                                                             
1095             uint8_t group_start_alloc = SPP_GROUP_SIZE / 8; //4;                                                        
1096             uint8_t alloc_sz          = group_start_alloc;                                                              
1097             for (int i=0; i<4; ++i)                                                                                     
1098             { 
1099                 for (int j=0; j<group_sz; ++j)                                                                          
1100                 {                                                                                                       
1101                     if (j && j % group_start_alloc == 0)                                                                
1102                         alloc_sz += group_start_alloc;                                                                  
1103                     s_alloc_batch_sz[i * group_sz + j] = alloc_sz;                                                                                                          
1104                 } 

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

The static storage will still be written and read from multiple threads concurrently, right?

right, but they would write the exact same thing, so it shouldn't matter.

I believe that the issue was that the static array could be partially written, when the second thread would test the first element and thought it was ready to use.

By testing the last element, if it is non-zero, that means that the array is fully initialized.
Indeed it is possible that two threads will initialize it at the same time, but I don't think it would cause a problem.

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

Ah, I didn't look carefully at the code. I didn't realize it was just about initialization.
Still, the way the code is now, it's considered UB, so I would be reluctant to use it.

Here is a suggestion of how to do thread-safe static initialization (as of C++11):

        struct AllocBatchSize                                                                                               
        {                                                                                                                   
            AllocBatchSize()                                                                                                
            {                                                                                                               
                // 32 bit bitmap                                                                                            
                // ........ .... .... .. .. .. .. .  .  .  .  .  .  .  .                                                    
                //     8     12   16  18 20 22 24 25 26   ...          32                                                   
                // ------------------------------------------------------                                                   
                uint8_t group_sz          = SPP_GROUP_SIZE / 4;                                                             
                uint8_t group_start_alloc = SPP_GROUP_SIZE / 8; //4;                                                        
                uint8_t alloc_sz          = group_start_alloc;                                                              
                for (int i=0; i<4; ++i)                                                                                     
                {                                                                                                           
                    for (int j=0; j<group_sz; ++j)                                                                          
                    {                                                                                                       
                        if (j && j % group_start_alloc == 0)                                                                
                            alloc_sz += group_start_alloc;                                                                  
                        m_data[i * group_sz + j] = alloc_sz;                                                                
                    }                                                                                                       
                    if (group_start_alloc > 2)                                                                              
                        group_start_alloc /= 2;                                                                             
                    alloc_sz += group_start_alloc;                                                                          
                }                                                                                                           
            }                                                                                                               
            uint8_t m_data[SPP_GROUP_SIZE];                                                                                 
        };                                                                                                                  
                                                                                                                            
        static const AllocBatchSize s_alloc_batch_sz;                                                                                                                            
                                                                                                                            
        return n ? static_cast<uint32_t>(s_alloc_batch_sz.m_data[n-1]) : 0; // more aggressive alloc at the beginning   

That also makes it a bit clearer that this whole code is just for static initialization.

On the other hand, Helgrind disappointed me as it seems to not understand that this is safe. (http://valgrind.10908.n7.nabble.com/helgrind-and-threadsafe-statics-td49861.html)

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

But I must say, for that to work, there is probably an atomic flag check somewhere in there, for every call, to make sure the code is initialized. Which could be costly...

The best way would be to make this all constexpr, but I'm not sure about how many compilers would support that.

It's possible to metaprogram the way out of this by writing some recursive template that expands the final value of the array in compile time. But that will probably require a bit more time to get right.

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

Thanks for looking into this. I do believe that the code with my change is fine, but your version with the struct is better and clearer. Do you want to submit a pull request?
By the way, I don't know what "is considered UB" mean?

from sparsepp.

anchpop avatar anchpop commented on July 23, 2024

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

The solution only works if the compiler writes the code you expect in the order you expect.
The compiler is free to reorder code as long as the observable output is the same.

Though personally I don't believe it would go that far, it's perfectly possible for the compiler to unroll that for loop and reorder statements at will. So, in theory, it could initialize the last position first and walk backwards. If it chooses to do so, a second thread testing the last position could think everything is initialized.

I investigated my solution a bit more, and it turns out, there is no atomic check (at least not in x86_64).

https://godbolt.org/g/NNhnEs

So I will post that as a pull request.

What is really beautiful is that, if you add constexpr to the constructor and change the standard to C++14, then the whole code is evaluated at compile time as is.
It would be nice to add some macroing to automatically detect C++14 is enabled and add the constexpr there.

I fixed my code to use the less aggressive flag for now, and I'm going to the beach today, but I can take this (and my other request (and I actually have one more)) next week :)

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

Perfect, have fun at the beach, I'll wait for your pull request next week.
Thanks for the explanation of the undefined behavior!

if you add constexpr to the constructor

Please use the macro SPP_CONSTEXPR (defined in spp_util.h) which expands to constexpr on compilers that support it.

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

constexpr keyword was introduced in C++11 but its applicability was expanded in C++14. For this use, it only works with C++14. Do you know if there is a macro to differentiate that?

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

What does it matter? constexpr will be present for the compilers who support the keyword. Whether the code is generated at compile time or when statics are initialized will depend on the compiler, but omitting constexpr when the code is not generated at compile time would not help (or am I missing something?).

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

It's just that on C++11, constexpr is supported for simple functions while adding it to a class constructor is an error: https://godbolt.org/g/DSPSCC

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

Oh, I see. We could probably define a macro for the various compilers to check that support, but I don't think it is worthwhile in this case. We are talking about just initializing 32 bytes, and it won't make a difference if it is done at compilation time.

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

It could make a difference because if done in compile time, the compiler will likely inline this function. If it's not a hot path or something, it won't matter anyway.
Anyway, let's start simple and leave it as a TODO to makei tbetter

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

Hum, I don't understand why the fact that the static is initialized at compile time or when the program loads would have any effect on inlining. But anyways I'm pretty sure it wouldn't make a difference.

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

Well, when the code is statically initialized, it looks like this:

int foo(int index)
{
    if (!initialized)
    {
        //do a bunch of calculations
        //do a bunch of calculations
        //do a bunch of calculations
        //do a bunch of calculations
    }
    return the_array[index];
}

So the amount of instructions in that function is probably too large to make it worth inlining.
While, if the values are computed at compile time, the function looks like:

int foo(int index)
{
    return the_array[index];
}

That one looks like a good candidate for inlining.

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

Nice!

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

Thanks, I understand now. I didn't know that the compiler generated that inline code in the function to initialize static variables, but now that I think of it of course it makes sense. I also understand why you feel constexpr would be better.

I just added the definition for SPP_CXX14_CONSTEXPR in spp_util.h, which you should be able to use in the constructor. Let me know if you'd like me to make the change, I don't mind either way.

from sparsepp.

brenoguim avatar brenoguim commented on July 23, 2024

I think I rather do it myself so I can learn the flow for contributing in github :)

from sparsepp.

greg7mdp avatar greg7mdp commented on July 23, 2024

Sure, thanks again.

from sparsepp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.