Code Monkey home page Code Monkey logo

Comments (10)

argilo avatar argilo commented on August 19, 2024

I did not observe the same issue on 64-bit Raspberry Pi OS, but I suspect it's because aarch64 support was only added do ORC in version 0.4.33, while Raspberry Pi OS has version 0.4.32.

from volk.

argilo avatar argilo commented on August 19, 2024

I suspect the current test suite would not catch this bug, since kernels are tested with random data. Perhaps some special-case values (0, 1, -1, std::numeric_limits<float>::max, std::numeric_limits<float>::min, std::numeric_limits<float>::epsilon, etc) should be included as well.

from volk.

jdemel avatar jdemel commented on August 19, 2024

That's a very difficult to solve issue. Basically, we'd need to work around the intended way ORC works. Or does this issue arise because of smth else? The comment in orctest.c implies this happens because of the specific implementation that is used in ORC on arm.

from volk.

argilo avatar argilo commented on August 19, 2024

The comment in orctest.c implies this happens because of the specific implementation that is used in ORC on arm.

That's correct. Only the ARM implementation is broken. I'd consider it a serious bug that sqrt(0) = NaN, but it seems ORC doesn't since they added an exception in their test suite: https://gitlab.freedesktop.org/gstreamer/orc/-/merge_requests/66

Perhaps a suitable workaround would be to disable the four affected VOLK kernels on ARM. That change could be reverted if ORC someday fixes their ARM sqrt implementation.

from volk.

jdemel avatar jdemel commented on August 19, 2024

Since the GQRX issues imply that the ORC implementation is slower than another, it doesn't hurt to disable these ORC kernels.

from volk.

argilo avatar argilo commented on August 19, 2024

On a Raspberry Pi 3B+ running 64-bit Raspberry Pi OS:

RUN_VOLK_TESTS: volk_32fc_magnitude_32f(131071,1987)
generic completed in 6106.05 ms
a_generic completed in 6125.19 ms
neon completed in 1808.37 ms
neon_fancy_sweet completed in 2292.84 ms
u_orc completed in 11231.1 ms
Best aligned arch: neon
Best unaligned arch: neon
RUN_VOLK_TESTS: volk_32f_sqrt_32f(131071,1987)
neon completed in 932.665 ms
generic completed in 12600.2 ms
u_orc completed in 13221.3 ms
Best aligned arch: neon
Best unaligned arch: neon

ORC is worse than generic in both cases, and much worse than neon.

from volk.

argilo avatar argilo commented on August 19, 2024

For volk_16ic_magnitude_16i and volk_16ic_magnitude_32f, the ORC kernels are already disabled on all platforms:

#ifdef LV_HAVE_ORC_DISABLED
extern void volk_16ic_magnitude_16i_a_orc_impl(int16_t* magnitudeVector,
const lv_16sc_t* complexVector,
float scalar,
unsigned int num_points);
static inline void volk_16ic_magnitude_16i_u_orc(int16_t* magnitudeVector,
const lv_16sc_t* complexVector,
unsigned int num_points)
{
volk_16ic_magnitude_16i_a_orc_impl(
magnitudeVector, complexVector, SHRT_MAX, num_points);
}
#endif /* LV_HAVE_ORC */

#ifdef LV_HAVE_ORC_DISABLED
extern void volk_16ic_s32f_magnitude_32f_a_orc_impl(float* magnitudeVector,
const lv_16sc_t* complexVector,
const float scalar,
unsigned int num_points);
static inline void volk_16ic_s32f_magnitude_32f_u_orc(float* magnitudeVector,
const lv_16sc_t* complexVector,
const float scalar,
unsigned int num_points)
{
volk_16ic_s32f_magnitude_32f_a_orc_impl(
magnitudeVector, complexVector, scalar, num_points);
}
#endif /* LV_HAVE_ORC */

from volk.

n-west avatar n-west commented on August 19, 2024

damn, fancysweet isn't faster 👎

from volk.

jdemel avatar jdemel commented on August 19, 2024

Regarding the benchmarks #203 points out that ORC benchmarks include some one time overhead. We might need to test this.
On the other hand, the ORC kernel test runs 1987 iterations and takes more than 13s while the NEON kernel finishes in less than 1s. I really hope this one time ORC overhead is not included...

from volk.

argilo avatar argilo commented on August 19, 2024

This problem is now worse, because Debian 12 (and the latest Raspberry Pi OS) include ORC 0.4.33, which adds support for 64-bit ARM. As a result, these kernels are now broken on both 32-bit and 64-bit ARM.

from volk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.