Code Monkey home page Code Monkey logo

Comments (21)

musicinmybrain avatar musicinmybrain commented on May 18, 2024 2

I successfully built 7013a8b in Fedora.

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024 1

Ok, thanks for looking into this at all. I’m happy to test any further changes you come up with.

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024 1

It falls back to default page size. However, it uses MAP_LOCKED and EAGAIN says that it cannot lock the pages. That's ulimit -l which may need to be set to a higher value. On my systems I have it set to unlimited.

That makes sense. I should have looked at it more closely.

The default is 64 on most systems, which at 4k pages should be 128kiB, and the benchmarks want 32MiB.

I’m able to run the benchmarks locally with the default resource limits by patching out the MAP_LOCKED flag, which could affect benchmark accuracy but at least shows that they can build and start to run. (I’m still waiting for them to complete.)

On the other hand, there are various problems that keep the benchmarks from building on anything but x86_64, most obviously:

../../xenium/xenium/utils.hpp:80:4: error: #error "Unsupported compiler"
   80 |   #error "Unsupported compiler"
      |    ^~~~~

and

../../xenium/xenium/utils.hpp: In function ‘uint64_t xenium::utils::random()’:
../../xenium/xenium/utils.hpp:84:12: error: ‘getticks’ was not declared in this scope
   84 |     return getticks() >> 4;
      |            ^~~~~~~~

and

../src/benchmarks.cc:49:27: error: ‘__builtin_ia32_rdtsc’ was not declared in this scope; did you mean ‘__builtin_va_list’?
   49 | using cycles_t = decltype(__builtin_ia32_rdtsc());
      |                           ^~~~~~~~~~~~~~~~~~~~
      |                           __builtin_va_list

I think that indicates I should be satisfied with the tests and example building and running successfully.


Thanks again for adding these implementations! I will go ahead and close the issue, but feel free to reopen it if there are any more details you want to discuss, or ideas you would like me to test.

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

I think supporting those architectures should be fairly trivial - just add spin_loop_pause function that uses the CPU specific pause instruction, if the CPU has one. It should work even if spin_loop_pause is defined as an empty function.

If I had access to such machines I would be able to test it.

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

If I had access to such machines I would be able to test it.

While I don’t have direct interactive shell access on these architectures, I can run the atomic_queue tests on them by doing RPM “scratch builds” on the Fedora builders. Does that help?

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

Building helps. Ideally, test and/or benchmark executables must be built and run. They both do consistency checks to verify that the queue operates as expected.

I have never cross-compiled, so am not sure how I would go about making changes to support these architectures and trying to compile them.

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

Made some changes for you: ee3d91c

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

Thanks. I’ll run the tests right away, and I’ll investigate whether I can get the benchmarks (with their extra dependencies) to build as a one-time hack.

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

It’s my understanding that cache lines are 128 bytes on powerpc64, and 256 bytes on s390x. You can detect s390x with the __s390x__ preprocessor macro (https://sourceforge.net/p/predef/wiki/Architectures/).

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

The tests passed. (The RPMs for that build will expire after a while, and some or all of the logs might too.)

I’ll work on trying to build and run the benchmarks later.

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

It’s my understanding that cache lines are 128 bytes on powerpc64, and 256 bytes on s390x. You can detect s390x with the __s390x__ preprocessor macro (https://sourceforge.net/p/predef/wiki/Architectures/).

f2a36a3

Fell free to submit pull requests, I will approve promptly.

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

The tests passed. (The RPMs for that build will expire after a while, and some or all of the logs might too.)

I’ll work on trying to build and run the benchmarks later.

benchmark may be too CPU-intense to run in continious integrations systems.

"Unit tests with gcc" in GitHub CI runs make -rj2 TOOLSET=gcc example run_tests, it is intended to be invoked by continious integrations systems.

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

"Unit tests with gcc" in GitHub CI runs make -rj2 TOOLSET=gcc example run_tests, it is intended to be invoked by continious integrations systems.

That should be the same as what I’m running via meson test, I think. I started running the example too, and that’s also fine for ee3d91c: https://koji.fedoraproject.org/koji/taskinfo?taskID=77575442

I did try to build and run the benchmarks (again, as a one-off, since even if the runtime isn’t too long, I don’t want to package three other queue implementations). It seems like I can’t run them at all without the hugeadm comands recommended in README.md.

For f2a36a3, however, I’m getting a compiler error on the two new platforms: https://koji.fedoraproject.org/koji/taskinfo?taskID=77575138

g++ -Itests.p -I. -I.. -I../include -I/usr/include -fdiagnostics-color=always -DNDEBUG -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wnon-virtual-dtor -std=gnu++14 -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mcpu=power8 -mtune=power8 -fasynchronous-unwind-tables -fstack-clash-protection -DBOOST_TEST_DYN_LINK=1 -DBOOST_ALL_NO_LIB -pthread -MD -MQ tests.p/src_tests.cc.o -MF tests.p/src_tests.cc.o.d -o tests.p/src_tests.cc.o -c ../src/tests.cc
In file included from ../src/tests.cc:8:
../include/atomic_queue/atomic_queue.h: In instantiation of ‘class atomic_queue::AtomicQueueB2<std::unique_ptr<int> >’:
../src/tests.cc:136:42:   required from here
../include/atomic_queue/atomic_queue.h:518:19: error: static assertion failed: Unexpected SHUFFLE_BITS.
  518 |     static_assert(SHUFFLE_BITS, "Unexpected SHUFFLE_BITS.");
      |                   ^~~~~~~~~~~~
../include/atomic_queue/atomic_queue.h:518:19: note: ‘atomic_queue::AtomicQueueB2<std::unique_ptr<int> >::SHUFFLE_BITS’ evaluates to false
../include/atomic_queue/atomic_queue.h: In instantiation of ‘class atomic_queue::AtomicQueueB2<float, std::allocator<float>, true, true, true>’:
../src/tests.cc:160:21:   required from here
../include/atomic_queue/atomic_queue.h:518:19: error: static assertion failed: Unexpected SHUFFLE_BITS.
../include/atomic_queue/atomic_queue.h:518:19: note: ‘atomic_queue::AtomicQueueB2<float, std::allocator<float>, true, true, true>::SHUFFLE_BITS’ evaluates to false
../include/atomic_queue/atomic_queue.h: In instantiation of ‘class atomic_queue::AtomicQueueB2<float>’:
../include/atomic_queue/atomic_queue.h:593:8:   required from ‘struct atomic_queue::RetryDecorator<atomic_queue::AtomicQueueB2<float> >’
../src/tests.cc:179:71:   required from here
../include/atomic_queue/atomic_queue.h:518:19: error: static assertion failed: Unexpected SHUFFLE_BITS.
../include/atomic_queue/atomic_queue.h:518:19: note: ‘atomic_queue::AtomicQueueB2<float>::SHUFFLE_BITS’ evaluates to false

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

It looks like the larger cache sizes weren't anticipated in other parts of the code. I will look into that in the next few days.

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

e50e567 adds support for larger cache sizes.

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

"Unit tests with gcc" in GitHub CI runs make -rj2 TOOLSET=gcc example run_tests, it is intended to be invoked by continious integrations systems.

That should be the same as what I’m running via meson test, I think.

That's right.

I did try to build and run the benchmarks (again, as a one-off, since even if the runtime isn’t too long, I don’t want to package three other queue implementations). It seems like I can’t run them at all without the hugeadm comands recommended in README.md.

It should run even if no hudeadm is installed and only issue a warning.

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

e50e567 adds support for larger cache sizes.

This seems to work on ppc64le, but I see a test failure on s390x (which did work with the cache line size set to 64 bytes):

+ /usr/bin/meson test -C redhat-linux-build --num-processes 3 --print-errorlogs --verbose
ninja: Entering directory `/builddir/build/BUILD/atomic_queue-e50e567ff86b029c6680e696d509c4b7bed13095/redhat-linux-build'
ninja: no work to do.
1/1 tests RUNNING       
>>> MALLOC_PERTURB_=23 /builddir/build/BUILD/atomic_queue-e50e567ff86b029c6680e696d509c4b7bed13095/redhat-linux-build/tests
1/1 tests FAIL            0.95s   (exit status 201 or signal 73 SIGinvalid)
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stdout:
Running 10 test cases...
../src/tests.cc(67): �[1;31;49merror: in "stress_BlockingAtomicQueue": check r > (expected_result / CONSUMERS) / 10 has failed [871616 <= 16666683333]�[0;39;49m
stderr:
�[1;31;49m*** 1 failure is detected in the test module "atomic_queue"
�[0;39;49m
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
Summary of Failures:
1/1 tests FAIL            0.95s   (exit status 201 or signal 73 SIGinvalid)
Ok:                 0   
Expected Fail:      0   
Fail:               1   
Unexpected Pass:    0   
Skipped:            0   
Timeout:            0   
Full log written to /builddir/build/BUILD/atomic_queue-e50e567ff86b029c6680e696d509c4b7bed13095/redhat-linux-build/meson-logs/testlog.txt

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

It should run even if no hudeadm is installed and only issue a warning.

I thought that was the intent, but it doesn’t seem to be true in practice:

+ redhat-linux-build/benchmarks
Warning: Failed to allocate 1GB huge pages. Run "sudo hugeadm --pool-pages-min 1GB:1 --pool-pages-max 1GB:1".
Warning: Failed to allocate 2MB huge pages. Run "sudo hugeadm --pool-pages-min 2MB:16 --pool-pages-max 2MB:16".
terminate called after throwing an instance of 'std::system_error'
  what():  mmap: Resource temporarily unavailable
/var/tmp/rpm-tmp.ZcvJSj: line 35:   356 Aborted                 (core dumped) redhat-linux-build/benchmarks

Okay, the warnings are just warnings, but looking further at the system error:

$ gdb ./benchmarks
[blah blah blah]
(gdb) run
Starting program: /home/ben/fedora/mine/atomic-queue/atomic_queue-f2a36a392e02cc2643b07292e639c4a2ae7d4a35/x86_64-redhat-linux-gnu/benchmarks 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Warning: Failed to allocate 1GB huge pages. Run "sudo hugeadm --pool-pages-min 1GB:1 --pool-pages-max 1GB:1".
Warning: Failed to allocate 2MB huge pages. Run "sudo hugeadm --pool-pages-min 2MB:16 --pool-pages-max 2MB:16".
terminate called after throwing an instance of 'std::system_error'
  what():  mmap: Resource temporarily unavailable

Program received signal SIGABRT, Aborted.
0x00007ffff7b732a2 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.33-20.fc34.x86_64 libgcc-11.2.1-1.fc34.x86_64 libstdc++-11.2.1-1.fc34.x86_64 tbb-2020.3-7.fc34.x86_64
(gdb) bt
#0  0x00007ffff7b732a2 in raise () from /lib64/libc.so.6
#1  0x00007ffff7b5c8a4 in abort () from /lib64/libc.so.6
#2  0x00007ffff7de2a46 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#3  0x00007ffff7dee29c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#4  0x00007ffff7dee307 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007ffff7dee5a9 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x0000555555560801 in atomic_queue::HugePages::HugePages (t=<optimized out>, size=33554432, this=0x7fffffffd730)
    at ../src/huge_pages.cc:55
#7  main () at ../src/benchmarks.cc:564

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

e50e567 adds support for larger cache sizes.

This seems to work on ppc64le, but I see a test failure on s390x (which did work with the cache line size set to 64 bytes):

../src/tests.cc(67): �[1;31;49merror: in "stress_BlockingAtomicQueue": check r > (expected_result / CONSUMERS) / 10 has failed [871616 <= 16666683333]�[0;39;49m
stderr:
�[1;31;49m*** 1 failure is detected in the test module "atomic_queue"
�[0;39;49m

False positives are possible in this particular test. It requires 6 participating threads to be running simultenously uninterrupted.

I made this check a warning now. 7013a8b

from atomic_queue.

max0x7ba avatar max0x7ba commented on May 18, 2024

It should run even if no hudeadm is installed and only issue a warning.

I thought that was the intent, but it doesn’t seem to be true in practice:

terminate called after throwing an instance of 'std::system_error'
  what():  mmap: Resource temporarily unavailable

Program received signal SIGABRT, Aborted.
0x00007ffff7b732a2 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.33-20.fc34.x86_64 libgcc-11.2.1-1.fc34.x86_64 libstdc++-11.2.1-1.fc34.x86_64 tbb-2020.3-7.fc34.x86_64
(gdb) bt
#0  0x00007ffff7b732a2 in raise () from /lib64/libc.so.6
#1  0x00007ffff7b5c8a4 in abort () from /lib64/libc.so.6
#2  0x00007ffff7de2a46 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#3  0x00007ffff7dee29c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#4  0x00007ffff7dee307 in std::terminate() () from /lib64/libstdc++.so.6
#5  0x00007ffff7dee5a9 in __cxa_throw () from /lib64/libstdc++.so.6
#6  0x0000555555560801 in atomic_queue::HugePages::HugePages (t=<optimized out>, size=33554432, this=0x7fffffffd730)
    at ../src/huge_pages.cc:55
#7  main () at ../src/benchmarks.cc:564

It falls back to default page size. However, it uses MAP_LOCKED and EAGAIN says that it cannot lock the pages. That's ulimit -l which may need to be set to a higher value. On my systems I have it set to unlimited.

from atomic_queue.

musicinmybrain avatar musicinmybrain commented on May 18, 2024

False positives are possible in this particular test. It requires 6 participating threads to be running simultenously uninterrupted.

I made this check a warning now. 7013a8b

This makes sense, especially as the available s390x hardware has only three physical cores.

Some of the other builders in the Fedora infrastructure have fewer than six cores available—five is typical for both 32-bit and 64-bit ARM—but that seems not to have been a problem so far.

from atomic_queue.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.