Comments (10)
I did not observe the same issue on 64-bit Raspberry Pi OS, but I suspect it's because aarch64 support was only added do ORC in version 0.4.33, while Raspberry Pi OS has version 0.4.32.
from volk.
I suspect the current test suite would not catch this bug, since kernels are tested with random data. Perhaps some special-case values (0, 1, -1, std::numeric_limits<float>::max
, std::numeric_limits<float>::min
, std::numeric_limits<float>::epsilon
, etc) should be included as well.
from volk.
That's a very difficult to solve issue. Basically, we'd need to work around the intended way ORC works. Or does this issue arise because of smth else? The comment in orctest.c
implies this happens because of the specific implementation that is used in ORC on arm.
from volk.
The comment in orctest.c implies this happens because of the specific implementation that is used in ORC on arm.
That's correct. Only the ARM implementation is broken. I'd consider it a serious bug that sqrt(0) = NaN, but it seems ORC doesn't since they added an exception in their test suite: https://gitlab.freedesktop.org/gstreamer/orc/-/merge_requests/66
Perhaps a suitable workaround would be to disable the four affected VOLK kernels on ARM. That change could be reverted if ORC someday fixes their ARM sqrt implementation.
from volk.
Since the GQRX issues imply that the ORC implementation is slower than another, it doesn't hurt to disable these ORC kernels.
from volk.
On a Raspberry Pi 3B+ running 64-bit Raspberry Pi OS:
RUN_VOLK_TESTS: volk_32fc_magnitude_32f(131071,1987)
generic completed in 6106.05 ms
a_generic completed in 6125.19 ms
neon completed in 1808.37 ms
neon_fancy_sweet completed in 2292.84 ms
u_orc completed in 11231.1 ms
Best aligned arch: neon
Best unaligned arch: neon
RUN_VOLK_TESTS: volk_32f_sqrt_32f(131071,1987)
neon completed in 932.665 ms
generic completed in 12600.2 ms
u_orc completed in 13221.3 ms
Best aligned arch: neon
Best unaligned arch: neon
ORC is worse than generic in both cases, and much worse than neon.
from volk.
For volk_16ic_magnitude_16i and volk_16ic_magnitude_32f, the ORC kernels are already disabled on all platforms:
volk/kernels/volk/volk_16ic_magnitude_16i.h
Lines 281 to 294 in 73c2580
volk/kernels/volk/volk_16ic_s32f_magnitude_32f.h
Lines 261 to 276 in 73c2580
from volk.
damn, fancysweet isn't faster 👎
from volk.
Regarding the benchmarks #203 points out that ORC benchmarks include some one time overhead. We might need to test this.
On the other hand, the ORC kernel test runs 1987 iterations and takes more than 13s while the NEON kernel finishes in less than 1s. I really hope this one time ORC overhead is not included...
from volk.
This problem is now worse, because Debian 12 (and the latest Raspberry Pi OS) include ORC 0.4.33, which adds support for 64-bit ARM. As a result, these kernels are now broken on both 32-bit and 64-bit ARM.
from volk.
Related Issues (20)
- Move to better test infrastructure HOT 3
- volk_8ic_x2_multiply_conjugate_16ic is not documented
- qa_volk_32fc_index_* are flaky HOT 1
- Various kernels with 16i inputs do not handle integer overflow consistently HOT 2
- AppVeyor builds sometimes fail HOT 1
- Add i386 to CI HOT 4
- VOLK_VERSION is too hard to actually use. HOT 9
- volk CMakeLists.txt uses CMAKE_INSTALL_LIBDIR without including the 'GNUInstallDirs' module HOT 6
- v3.1.0 volk_32fc_s32f_atan2_32f.h avx2 and avx2_fma kernels return NaN for an input element 0+0j HOT 1
- Remove LV_HAVE_LIB_SIMDMATH?
- volk_8u_conv_k7_r2puppet is broken
- (RHEL/Rocky Linux) cpu_features/CMakeLists.txt not found HOT 2
- lto-type-mismatch building HEAD
- volk 3.1.1 regression test failure HOT 22
- cannot built 3.1.1 from release tarball HOT 8
- release script should switch to bash syntax completely HOT 3
- document cpu_features external dependency preference.
- Automate docs generation and release HOT 1
- VOLK with Python HOT 1
- Warnings with Python 3.12 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from volk.