Code Monkey home page Code Monkey logo

Comments (8)

jan-wassenberg avatar jan-wassenberg commented on August 15, 2024

Hi,

Can I take just one simple SIMD routine and convert it to hwy leaving the rest as is?

Yes, that works. You can even temporarily mix Highway and __m128 in the same function: on x86, the native vector type is accessible as hwy_vec.raw. Note that this will not work on all platforms (eg RVV), but it may be helpful while porting.

I set the -mavx2

I'd strongly recommend avoiding that once you've included Hwy. That flag allows the compiler to generate avx2 code elsewhere (even if you didn't ask for it via intrinsics/Highway), which can cause crashes.
Instead, HWY_BEFORE_NAMESPACE gives subsequent code permission to use AVX2 as if the flag had been set, but that stops at HWY_AFTER_NAMESPACE. Alternatively, you can prefix functions with HWY_ATTR for the same effect, but that has to be done for each caller into which the function is inlined.

I went through the docs, but I couldn't find a simple enough example.

Does the "Loops, memory" slide help?
This translates into LoadU, Mul() or operator*, and StoreU.

from highway.

boxerab avatar boxerab commented on August 15, 2024

@jan-wassenberg thanks! I got my first simple function working - much much nicer experience than what I was using previously, which was a set of macros wrapping x86 intrinsics.

from highway.

boxerab avatar boxerab commented on August 15, 2024

This is my lambda to do RGB -> YUV conversion:

HWY_BEFORE_NAMESPACE();
auto compressor = [index, chunkSize, chan0, chan1, chan2]() {
	uint64_t begin = (uint64_t)index * chunkSize;
	const HWY_FULL(int32_t) d;
	for(auto j = begin; j < begin + chunkSize; j += Lanes(d))
	{
		auto r = Load(d,chan0+j);
		auto g = Load(d,chan1+j);
		auto b = Load(d,chan2+j);
		auto y = ShiftRight<2>((g+g) + b + r);
		auto u = b-g;
		auto v = r-g;
		Store(y,d,chan0+j);
		Store(u,d,chan1+j);
		Store(v,d,chan2+j);
	}
	return 0;
};
HWY_AFTER_NAMESPACE();

from highway.

jan-wassenberg avatar jan-wassenberg commented on August 15, 2024

@boxerab glad to hear it, thanks for sharing!

Oh, interesting that BEFORE_NAMESPACE works in a function - it's intended to be at namespace scope. For lambdas, you could instead put a HWY_ATTR between "()" and {.

from highway.

boxerab avatar boxerab commented on August 15, 2024

@jan-wassenberg thanks!. Unfortunately, doesn't look like AVX2 is being enabled. When I printf the number of lanes , I get 1.

Btw, Highway is linked to my main project as a static library. Side question: how does Highway tell the compiler to use avx2 commands for a specific method ?

Here is my modified method:

#undef HWY_TARGET_INCLUDE
#define HWY_TARGET_INCLUDE "mct.cpp"
#include <hwy/foreach_target.h>
#include <hwy/highway.h>

HWY_BEFORE_NAMESPACE();
namespace grk {
namespace HWY_NAMESPACE {
void hwy_compress_rev(int32_t* GRK_RESTRICT chan0, int32_t* GRK_RESTRICT chan1,
					   int32_t* GRK_RESTRICT chan2, uint64_t n)
{
	size_t i = 0;
	size_t num_threads = ThreadPool::get()->num_threads();
	size_t chunkSize = n / num_threads;
	const HWY_FULL(int32_t) d;
	printf("Lanes: %d\n", Lanes(d));
	chunkSize = (chunkSize / Lanes(d)) * Lanes(d);
	if(chunkSize > Lanes(d))
	{
		std::vector<std::future<int>> results;
		for(uint64_t tr = 0; tr < num_threads; ++tr)
		{
			uint64_t index = tr;
			auto compressor = [index, chunkSize, chan0, chan1, chan2]() {
				uint64_t begin = (uint64_t)index * chunkSize;
				const HWY_FULL(int32_t) d;
				for(auto j = begin; j < begin + chunkSize; j += Lanes(d))
				{
					auto r = Load(d,chan0+j);
					auto g = Load(d,chan1+j);
					auto b = Load(d,chan2+j);
					auto y = ShiftRight<2>((g+g) + b + r);
					auto u = b-g;
					auto v = r-g;
					Store(y,d,chan0+j);
					Store(u,d,chan1+j);
					Store(v,d,chan2+j);
				}
				return 0;
			};
			if(num_threads > 1)
				results.emplace_back(ThreadPool::get()->enqueue(compressor));
			else
				compressor();
		}
		for(auto& result : results)
		{
			result.get();
		}
		i = chunkSize * num_threads;
	}
	for(; i < n; ++i)
	{
		int32_t r = chan0[i];
		int32_t g = chan1[i];
		int32_t b = chan2[i];
		int32_t y = (r + (g * 2) + b) >> 2;
		int32_t u = b - g;
		int32_t v = r - g;
		chan0[i] = y;
		chan1[i] = u;
		chan2[i] = v;
	}
}
} // namespace HWY_NAMESPACE
} // namespace grk
HWY_AFTER_NAMESPACE();

from highway.

jan-wassenberg avatar jan-wassenberg commented on August 15, 2024

The code looks good. This should be being compiled for multiple targets including AVX2.

The decision as to which one to call - grk::N_SCALAR:: hwy_compress_rev or N_AVX2 etc - is made at the call site (HWY_DYNAMIC_DISPATCH) and actually determined in targets.cc. Would be helpful to know the values of bits and supported_mask_ at the end of SupportedTargets.

Also, are you building for a 32-bit target? AVX2 is disabled via HWY_BROKEN_TARGETS there because we saw some issues.

Side question: how does Highway tell the compiler to use avx2 commands for a specific method ?

The Store() function in N_AVX2:: calls an AVX2 intrinsic, and either HWY_BEFORE_NAMESPACE or HWY_ATTR set a 'target-specific attribute' which gives permission to use AVX2, as if -mavx2 had been specified for the file, but with fine-grained control so we do not risk the compiler generating AVX2 outside of our runtime CPU checks.

from highway.

boxerab avatar boxerab commented on August 15, 2024

Thanks, I am not building for 32 bit. Do I need to add HWY_DYNAMIC_DISPATCH when I actually call my method ?

Currently I simply call

HWY_NAMESPACE::hwy_compress_rev(chan0,chan1,chan2,n);

from highway.

boxerab avatar boxerab commented on August 15, 2024

Yep, that was it:

HWY_EXPORT(hwy_compress_rev);
void mct::compress_rev(int32_t* GRK_RESTRICT chan0, int32_t* GRK_RESTRICT chan1,
					   int32_t* GRK_RESTRICT chan2, uint64_t n)
{
	HWY_DYNAMIC_DISPATCH(hwy_compress_rev)(chan0,chan1,chan2,n);
}

from highway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.