mitsuba-renderer / enoki Goto Github PK

View Code? Open in Web Editor NEW

1.2K 46.0 95.0 2.6 MB

Enoki: structured vectorization and differentiation on modern processor architectures

License: Other

CMake 1.58% C++ 91.01% Shell 0.19% Python 0.90% Cuda 6.05% C 0.27%

enoki's Issues

How to dereference?

Hi,

I would like to do dereference, something like this...

using PtrP   = enoki::Packet<uintptr_t, 8>;
using Uint16P   = enoki::Packet<uint16_t, 8>;
PtrP p = fun();
Uint16P i = *p;

But can't compile this code.
Secondly, I think enoki::gather<>() fits for this situation.

Uint16P i = enoki::gather<Uint16P>(p,0);

But it doesn't return desired result.
Maybe gather<> presuppose pointer is not packet, I thought.

Any solution?

Thanks.

Aligned memory allocation functions missing.

The documentation references these functions https://enoki.readthedocs.io/en/master/reference.html#memory-allocation but I can't find them in the code.

Unexpected `select` result

I came across a potential Mask and / or select bug in the Mitsuba2 codebase. Here is an MVE:

#include <iostream>
#include <enoki/array.h>

using namespace enoki;

namespace {
using Float    = float;
constexpr size_t PacketSize = enoki::max_packet_size / sizeof(Float);

using Point4f = Packet<Float, 4>;
using MyMask = mask_t<Point4f>;

template <typename T>
void print(const T &val) {
    std::cout << val << std::endl;
}
}  // namespace

int main() {
    MyMask m1(true);
    MyMask m2(true | true);

    print("---  These two masks look identical:");
    print(m1);
    print(m2);
    print("---  But maybe they aren't?");
    print(m1 & m2);
    print(all(eq(m1, m2)));
    print("---  Using scalars (for reference):");
    print(select(true, 1.0f, 0.0f));
    print(select(true | true, 1.0f, 0.0f));
    print("---  Now, using packets:");
    print(select(m1, Point4f(1.0f), Point4f(0.0f)));
    print(select(m2, Point4f(1.0f), Point4f(0.0f)));  // Unexpected result here.
}

Running it outputs:

---  These two masks look identical:
[1, 1, 1, 1]
[1, 1, 1, 1]
---  But maybe they aren't?
[1, 1, 1, 1]
0
---  Using scalars (for reference):
1
1
---  Now, using packets:
[1, 1, 1, 1]
[0, 0, 0, 0]

Note how the last line's result is unexpected.

Inspecting the masks in LLDB, they indeed look different:

(lldb) p m1
((anonymous namespace)::MyMask) $0 = {
  enoki::StaticMaskImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default>, void> = {
    enoki::StaticArrayImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default> > = {
      m = (NaN, NaN, NaN, NaN)
    }
  }
}
(lldb) p m2
((anonymous namespace)::MyMask) $1 = {
  enoki::StaticMaskImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default>, void> = {
    enoki::StaticArrayImpl<float, 4, true, enoki::RoundingMode::Default, enoki::PacketMask<float, 4, true, enoki::RoundingMode::Default> > = {
      m = (0.00000000000000000000000000000000000000000000140129846, 0.00000000000000000000000000000000000000000000140129846, 0.00000000000000000000000000000000000000000000140129846, 0.00000000000000000000000000000000000000000000140129846)
    }
  }
}

I think that LLDB's printers don't print the mask entries' correctly anyway, but at least we can confirm that they are different.

conversion from FloatD to FloatC?

e.g.
FloatD cuda_diff = ...;
FloatC cuda = cuda_diff.val();
or
Vector3fD 3d_diff = ...;
Vector3fC 3d = 3d_diff.val();

I really like the enoki with the template design which can be used on multiple platform and autodiff. I want to use this as the base of my fluid simulation code. so I just make a little tests about the efficiency:

std::array<float, 3> srgb_gamma(std::array<float, 3> x) {
    std::array<float, 3> result;
    for (int i = 0; i < 3; i++) {
        if (x[i] <= 0.0031308f)
           result[i] = x[i] * 12.92f;
        else
           result[i] = std::pow(x[i] * 1.055f, 1.f / 2.4f) - 0.055f;
    }
    return result;
}

I handwrite a function and compare the code given in the tutorial, I loop for 10000times and find my test is 100x faster than enoki(without -msse4), 20xfaster(with -msse4), I can't figure out why? Does I miss something in compile flag?

UB issues with bool_array_t initialization.

I get an UndefinedBehaviourSanitizer hit from Google's sanitizer (https://github.com/google/sanitizers) when initializing a dynamic array of a struct containing bool values to a number of slices not a multiple of the packet size.

Taking the example from here https://enoki.readthedocs.io/en/master/dynamic.html?highlight=bool_array_t#custom-dynamic-data-structures, if you do something like

using FloatP = Packet<float, 4>;
using FloatX = DynamicArray<FloatP>;
using GPSCoord2fX = GPSCoord2<FloatX>;
GPSCoord2fX coord;
set_slices(coord, 1001);

UBSAN will fire saying:

enoki/array_fallbacks.h:495:16: runtime error: load of value 190, which is not a valid value for type 'const bool'

I dug a little into this and traced it down to the clean_trailing_() function in dynamic.h, specifically this line;

store(addr, load<Packet>(addr) & mask);

Something weird is happening with the types here that it doesn't like. I think load<Packet>(addr) causes a read of uninitialized bool values, which are then put into the & expression at array_fallbacks.h:495. A workaround is changing the Bool type in the struct to:

using Bool = enoki::replace_scalar_t<Value, uint8_t>;

This avoids the UB and functions as you'd expect.

A typo

README.md:

$ git clone --rescursive https://github.com/mitsuba-renderer/enoki

rescursive -> recursive

`ENOKI_STRUCT_DYNAMIC` with `mask` member

Slicing a dynamically-sized mask (mask_t<FloatX>) returns a float & (see Example 1). This makes sense given that masks are stored using their underlying type's registers and slice needs to return a reference.

But then, in the slicing operator defined by ENOKI_STRUCT_DYNAMIC,

template <typename T>                                                  \
static ENOKI_INLINE auto slice(T &&value, size_t index) {              \
    constexpr static bool co_ = std::is_const<                         \
        std::remove_reference_t<T>>::value;                            \
    using Value = Struct<decltype(enoki::slice(std::declval<           \
        std::conditional_t<co_, const Args &, Args &>>(), index))...>; \
    return Value{ ENOKI_MAP_EXPR_F2(enoki::slice, value, index,        \
                                   __VA_ARGS__) };                     \
}

the following becomes problematic:

MyStruct(
    slice(value.arg1, index), ..., 
    // Trying to initialize a `mask_t<Float &>` with a `Float &`
    slice(value.some_mask, index)
)

Would there be a way to initialize the mask with a reference to the underlying storage directly?
This problem occurs in Mitsuba 2, see Example 2.

Example 1

#include <iostream>
#include <vector>
#include <enoki/array.h>

using namespace enoki;

namespace {
constexpr size_t PacketSize = enoki::max_packet_size / sizeof(float);
using Float    = float;
using FloatP   = Packet<Float, 4>;
using FloatX   = DynamicArray<FloatP>;

template <typename T>
ENOKI_NOINLINE void print(const T &val) {
    std::cout << val << std::endl;
}
}  // namespace

int main() {

    mask_t<FloatX> masks;
    set_slices(masks, 4);
    masks = false; masks[1] = true;

    auto mask = slice(masks, 1);

    print(mask);
    print(typeid(mask).name());

    print(masks.coeff(0));
    print(masks.coeff(1));
    print(typeid(masks.coeff(1)).name());

    return 0;
}

Result:

1
f
0
1
f

Example 2

Usage in Mitsuba 2 that triggers this issue:

// records.h
ENOKI_STRUCT_DYNAMIC(mitsuba::PositionSample, ...)

// Example usage that would trigger compilation error
Position3fX pos;
auto p = slice(pos, 1);

// Actual usage: python/records.cpp
bind_slicing_operators<PositionSample<Point3fX>>();

Nothing gets build from cmake

I clone the repository recursively, as suggested by the documentation:

$ git clone --recursive https://github.com/mitsuba-renderer/enoki
Cloning into 'enoki'...
...
Cloning into '/home/bram/src/enoki/ext/cub'...
...
Cloning into '/home/bram/src/enoki/ext/pybind11'...
...
Cloning into '/home/bram/src/enoki/ext/pybind11/tools/clang'...
...

I then call cmake

$ cd enoki
$ mkdir build
$ cd build
$ CXX=clang++-8 CC=clang-8 cmake ../
-- The CXX compiler identification is Clang 8.0.0
-- Check for working CXX compiler: /usr/bin/clang++-8
-- Check for working CXX compiler: /usr/bin/clang++-8 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Enoki: using libc++.
-- Found Sphinx: /usr/bin/sphinx-build  
-- Configuring done
-- Generating done
-- Build files have been written to: /home/bram/src/enoki/build

When I then try to make, nothing gets built.

$ make
$

I expect at least the tests to get built with that.

This is on Ubuntu 18.04.4 LTS

Consider making enoki (cmake) installable

Hello,

I have started using enoki in my project. Right now, I have basically cloned the entire repo into a subfolder and included enoki in my include paths. (using old cmake way).

It would be really nice to make it cmake installable. (Internally we use conan as package manager, and making a conan recipe of a project which is cmake installable is straightforward).

Essentially it would be nice if we can do this:

mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=/my/path/ -G ninja ..
ninja
ninja install

I don't know enough cmake to do this myself though :/
PS: Some info: http://mariobadr.com/creating-a-header-only-library-with-cmake.html

How to reshape/cut CUDAarray or add two array with different size?

In c++
FloatC data= (enoki::arange(8)) * 0.f;
FloatC data2 = (enoki::arange(8 * 2)) * 1.f;

How can I do something like:
data = data + data2;
or
data = data + data2[0:8] + data2[8:16];
Thanks!!

How to merge a large cuda array?

The question might be a little confusing but it is something related to rendering an image on GPU. Is there any suggestion on how to do that?
for example, if a want to render an image 600*800 with 32spp, is it a wise idea to create a CUDA array with size 600*800*32?(just assume the GPU can handle that) and use some method to take the average of that into a 600*800 size array? Is there any function to do that?
Also, the Cuda array has a gradient.
Thanks

memcpy from one blob to another using enoki

Consider this simple code:

void bar(const char* src, int src_size, char* dst, int dst_size) { 
  assert(src_size == dst_size);

  for (int i = 0; i < src_size; ++i) { 
    *dst++ = *src++;
  } 
}

this code generates the following assembly (only the loop part is shown here):

  40d7c8:       c5 fe 6f 04 07          vmovdqu ymm0,YMMWORD PTR [rdi+rax*1]
  40d7cd:       c5 fe 7f 04 02          vmovdqu YMMWORD PTR [rdx+rax*1],ymm0
  40d7d2:       48 83 c0 20             add    rax,0x20
  40d7d6:       48 39 c8                cmp    rax,rcx
  40d7d9:       75 ed                   jne    40d7c8 <bar(char const*, int, char*, int)+0x28>

gcc is smart enough to vectorize this loop and copy chunks of 32 bytes.

Now consider this code written with enoki:

void foo(const char* src, int src_size, char* dst, int dst_size) {
  using Array = enoki::Array<int, 8>;

  auto es = enoki::DynamicArray<Array>::map(src, src_size);
  auto ed = enoki::DynamicArray<Array>::map(dst, dst_size);
  for (int i = 0; i < (int)es.packets(); ++i) {
    const auto& pkt = es.packet(i);
    auto& dst_pkt = ed.packet(i);
    dst_pkt = pkt;
  }
}

This code generated this assembly:

  40d850:       c5 fd 6f 04 07          vmovdqa ymm0,YMMWORD PTR [rdi+rax*1]
  40d855:       c5 fd 7f 04 02          vmovdqa YMMWORD PTR [rdx+rax*1],ymm0
  40d85a:       48 83 c0 20             add    rax,0x20
  40d85e:       48 39 c8                cmp    rax,rcx
  40d861:       75 ed                   jne    40d850 <foo(char const*, int, char*, int)+0x20>

So almost the same code (except for aligned read).

Now I wanted to change the code so use two ymm registers to unroll this loop further. so I changed the Array in above code to

using Array = enoki::Array<int, 16>;

The assembly generated with 16 byte array is this:

  40d850:       c5 f9 6f 04 07          vmovdqa xmm0,XMMWORD PTR [rdi+rax*1]
  40d855:       c5 f8 29 04 02          vmovaps XMMWORD PTR [rdx+rax*1],xmm0
  40d85a:       c5 f9 6f 4c 07 10       vmovdqa xmm1,XMMWORD PTR [rdi+rax*1+0x10]
  40d860:       c5 f8 29 4c 02 10       vmovaps XMMWORD PTR [rdx+rax*1+0x10],xmm1
  40d866:       c5 f9 6f 54 07 20       vmovdqa xmm2,XMMWORD PTR [rdi+rax*1+0x20]
  40d86c:       c5 f8 29 54 02 20       vmovaps XMMWORD PTR [rdx+rax*1+0x20],xmm2
  40d872:       c5 f9 6f 5c 07 30       vmovdqa xmm3,XMMWORD PTR [rdi+rax*1+0x30]
  40d878:       c5 f8 29 5c 02 30       vmovaps XMMWORD PTR [rdx+rax*1+0x30],xmm3
  40d87e:       48 83 c0 40             add    rax,0x40
  40d882:       48 39 c8                cmp    rax,rcx
  40d885:       75 c9                   jne    40d850 <foo(char const*, int, char*, int)+0x20>

So instead of using two ymm registers, it uses 4 xmm registers. I find this quite odd. Do you have any idea why did enoki do that?

Question: Thread safety

It is possible to seamlessly use normal cpu threading library together with this library?
Or should the array object basically be treated as sequential only, thread-private object?

Treading 3D array as 4d array

Hello,

As the documentation says, 3D arrays are treated as 4D arrays to make better used of intrinsics, but this raises an interesting problem.

Consider the following code:

  using Array = enoki::Array<float, 3>;

  Array numerator{2, 4, 8};
  Array denominator{1, 1, 1};

  auto result = numerator / denominator;
  if (std::fetestexcept(FE_INVALID)) {
    throw std::runtime_error("domain error");
  }

This throws the exception because the last number in the Register is initialized to 0, and this leads to a division by zero. Note that this is not limited to division. Any operation on the last number (things like min, max) also trigger than exception.

We do a bunch of floating point computation and like to keep the floating point exception check to verify we didn't mess up.

I was wondering what is your suggestion to handle cases like this?

Using Mask to fliter an array

For example:
FloatC arr = {0.0f, 1.0f, 2.0f, 3.0f, 4.0f};
MaskC msk = {0, 0, 1, 1, 0};
FloatC flitered_arr = do_something(arr, msk);
flitered_arr is {0.0f, 0.0f, 2.0f, 3.0f, 0.0f};
or
FloatC flitered_arr = do_something2(5.0f, arr, msk);
flitered_arr is {5.0f, 5.0f, 2.0f, 3.0f, 5.0f};

Any fast way to copy a GPU array to CPU?

Is there a good method to copy a FloatC into a cpu array?
I am currently using pytorch's .cpu(), but it seems can be very slow if the graph is too complex.
Is there any better way either in c++ or python?

Link error when trying to use DynamicArray for DiffArray

I am trying to auto-diff using DiffArray<FloatX> defined below, instead of the usual DiffArray<CudaArray<float>>:

#include <enoki/dynamic.h>
#include <enoki/autodiff.h>

using namespace enoki;

using Float  = float;

// not working:
using FloatP = Packet<Float>;
using FloatX = DynamicArray<FloatP>;
using FloatD = DiffArray<FloatX>;

// working:
// using FloatD = DiffArray<Float>;

int main()
{
    FloatD x = 1.f;
    set_requires_gradient(x);
    FloatD y = 10.f * x;
    backward(y);
    std::cout << y << std::endl;
    std::cout << gradient(x) << std::endl;
}

Compiled with Clang++-9 on Ubuntu 18.04, linked with libenoki-autodiff.so, libenoki-cuda.so and cuda.so (the last two may not need; but just adding FYI). Which gives:

/tests/enoki/CMakeFiles/test_examples.dir/examples.cpp.o: In function `main':
examples.cpp:(.text+0x30): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::get()'
examples.cpp:(.text+0x3f): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::append_leaf(unsigned long)'
examples.cpp:(.text+0x7f): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::mul_(enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > > const&) const'
examples.cpp:(.text+0x89): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x9a): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::backward(unsigned int, bool)'
examples.cpp:(.text+0x104): undefined reference to `enoki::Tape<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::gradient(unsigned int)'
examples.cpp:(.text+0x166): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x16e): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x1cd): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
examples.cpp:(.text+0x1d5): undefined reference to `enoki::DiffArray<enoki::DynamicArray<enoki::Packet<float, 1ul, true, (enoki::RoundingMode)4> > >::~DiffArray()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

If instead using FloatD = DiffArray<Float>;, then it works. How to fix this link error for DiffArray<FloatX> defined above?

Consider adding assert with ENOKI_ASSUME_ALIGNED

Hello,

First, thanks for making this public.

While reading the documentation, I noticed this:

Performing an aligned load from an unaligned memory address will cause a general protection fault that immediately terminates the application.

and correctly, doing an avx512 load on an unaligned memory causes the application to segfault. Would it be possible to add assert( ptr % n == 0) whenever you do ENOKI_ASSUME_ALIGNED?

e.g.

      static ENOKI_INLINE Derived load_(const void *ptr) {
          return _mm512_load_ps((const Value *) ENOKI_ASSUME_ALIGNED(ptr, 64));
      }

      static ENOKI_INLINE Derived load_(const void *ptr) {
          assert((uintptr_t) ptr % 64 == 0);
          return _mm512_load_ps((const Value *) ENOKI_ASSUME_ALIGNED(ptr, 64));
      }

This catches the problem in debug build.

error: lvalue required as left operand of assignment

Trying to do this:

FloatD arr = {1.f,3.f,2.f,5.f,6.f,2.f,6.f};
FloatD t = 0.0f;
set_requires_gradient(t);

arr[2] = arr[2] + t
(error: lvalue required as left operand of assignment)

Dynamic Array of a fixed sized Matrix

Could you help me to clarify this behavior:

using Mat22f = enoki::Matrix< f32, 2 >;
using Mat22fBuffer = enoki::DynamicArray< enoki::Packet< Mat22f, 2 > >; // With 2
Mat22fBuffer x, y;
y = x;

// Compile time error:
array_generic.h(464,59): error C2440: '<function-style-cast>': cannot convert from 'enoki::Array<eMV::f32,2>' to 'enoki::Matrix<eMV::f32,2>'
array_generic.h(414,1): message : No constructor could take the source type, or constructor overload resolution was ambiguous
array_generic.h(344): message : see reference to function template instantiation 'void enoki::StaticArrayImpl<Value_,2,false,enoki::Packet<Value_,2>,int>::assign_<eMV::Mat22f&,0,1>(T,std::integer_sequence<size_t,0,1>)' being compiled
1>        with
1>        [
1>            Value_=eMV::Mat22f,
1>            T=eMV::Mat22f &
1>        ]

With:

using Mat22f = enoki::Matrix< f32, 2 >;
using Mat22fBuffer = enoki::DynamicArray< enoki::Packet< Mat22f, 2 > >; // With 1
Mat22fBuffer x, y;
y = x;

Compile with no problem.

IntC to int*?

I asked a question before that IntC::copy can copy a int* into IntC on gpu.
Is there any method to copy a IntC into int*?

Masked access behaves differently when compiling with -mavx2

Consider the following code snippet. The program prints [10, 20, 30, 40, 5, 6, 7, 8] four times if it's being compiled "out of the box". However as soon as I specify -mavx2 or march=native or the like
the output is [1, 2, 3, 4, 5, 6, 7, 8] for the first three prints. The fourth one works as expected, though.

#include <iostream>
#include <enoki/array.h>
using namespace enoki;
int main() {
  auto print = [](auto x) { std::cout << x << '\n'; };
  using Arr = Array<int, 8>;
  using M = mask_t<Arr>;
  M m{1,1,1,1,0,0,0,0};

  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    masked(a, m) *= 10;
    std::cout << a << std::endl;  // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
  }
  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    a = enoki::select(m, a * 10, a);
    std::cout << a << std::endl; // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
  }
  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    a[m] *= 10;
    std::cout << a << std::endl; // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
  }
  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    a[m > 0] *= 10;
    std::cout << a << std::endl; // <- OK: prints [10, 20, 30, 40, 5, 6, 7, 8]
  }
  return 0;
}

I've tested gcc-7.4, clang-7 and clang-9 on ubuntu 18.04.

Here's the CmakeLists.txt I'm using:

cmake_minimum_required(VERSION 3.15)
project(enoki_test)

set(CMAKE_CXX_STANDARD 17)

add_executable(enoki_test main.cpp)

target_include_directories(enoki_test PRIVATE ../enoki/include)

set(CMAKE_CXX_FLAGS "-mavx2")

Any idea how to fix this?

Allow DynamicArray to map over const range

Hello,

I have a use case in which I iterative over a const range, do some transformation on it and store it another range. The original range is const float.

I modelled this as:

  using FloatArray = enoki::Array<float, 8, true, enoki::RoundingMode::Default>;

  enoki::DynamicArray<const FloatArray> input;
  enoki::DynamicArray<FloatArray> destination;

and I do DynamicArray::map over my float ranges like this:

void foo(const float* input, size_t s) {
  input = enoki::DynamicArray<const FloatArray>(input, s);
}

However, map signature is:

static Derived map(void *ptr, size_t size) {

and I get a compiler error that I am casting away my constness.

If I change this to:

    template <typename T>
    static Derived map(T *ptr, size_t size) {

everything works out since now template type T has constness in it.

Would you be open to accept this change as a PR?

Cannot compile a simple example with enoki

Hi, I'm trying to use enoki, but it cannot compile the following example with cmake.

hello_enoki.cpp:

#include <enoki/array.h>
#include <string>
#include <iostream>

using namespace enoki;

using StrArray = Array<std::string, 2>;

int main(int argc, char **argv) {
    StrArray x("Hello ", "How are "), y("world!", "you?");
    std::cout << x + y << std::endl;

    return 0;
}

CMakeLists.txt:

cmake_minimum_required(VERSION 2.8.12)
project(mytest)

# C++17
include(CheckCXXCompilerFlag)
if (CMAKE_CXX_COMPILER_ID MATCHES "^(GNU|Clang|Emscripten|Intel)$")
  CHECK_CXX_COMPILER_FLAG("-std=c++17" HAS_CPP17_FLAG)

  if (HAS_CPP17_FLAG)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17")
  else()
    CHECK_CXX_COMPILER_FLAG("-std=c++1z" HAS_CPP1Z_FLAG)
    if (HAS_CPP1Z_FLAG)
      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++1z")
    else()
      message(FATAL_ERROR "Unsupported compiler -- nanogui requires C++17 support!")
    endif()
  endif()
elseif(MSVC)
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /std:c++17")
endif()

# Enoki
add_subdirectory(enoki)
enoki_set_compile_flags()
enoki_set_native_flags()
include_directories(enoki/include)

add_executable(mytest hello_enoki.cpp)

Any advice? thanks!

Recommended pattern for memory mapped arrays

Hi,

I have a 100GB file of floats that I have memory mapped, what would be the recommended pattern for doing things like finding the min and max value or computing a histogram? It seems like DynamicArray is the thing to use but it assumes ownership of the array. I could loop over fixed size chunks and load<>() them into an Array but then I need to deal with the boundary condition at the end if the dataset isn't a multiple of the Array size. What would be your suggestions for this scenario?

Is enoki support array index?

for example
A = [1, 2, 3, 4, 5]
I = [0, 1, 1, 2]
then A[I] = [A[0], A[1], A[1], A[2]] = [1, 2, 2, 3]

    FloatD some = {2.3f, 3.4f, 4.5f, 5.6f, 6.7f, 7.8f};
    IntC index = {1,1,3,3,5};

    FloatD check = some[index];

    value of check should be: [3.4f, 3.4f, 5.6f, 5.6f, 7.8f]

neigbours on a grid

Could Enoki be used to to apply a function to a grid when the function requires access to data points and their neighbors (typically when computing a numerical scheme) ? And what would be the preferred method ?

Should I extract the neighbors manually with a loop (in order to build one array by neighbor position) followed with the function application ?

Or is there a better way ? Maybe using a precomputed array of neighbours index ?

An example would be great as, if it is efficiently doable, that would be a great use case for Enoki.

Compiling with -mavx2 causes compiler error (clang8) regarding feature 'fma'

clang++-8 -I ../../src/ThreadTracer -I ../../src/enoki/include -mavx2 -O2 -g -std=c++17 try.cpp ../../src/ThreadTracer/threadtracer.o -o try
In file included from try.cpp:2:
In file included from ../../src/enoki/include/enoki/array.h:47:
../../src/enoki/include/enoki/array_avx.h:384:35: error: always_inline function '_mm256_fnmadd_ps' requires target
      feature 'fma', but would be inlined into function 'rsqrt_' that is compiled without support for 'fma'
                r = _mm256_mul_ps(_mm256_fnmadd_ps(t1, r, c1), t0);

$ clang++-8 --version
clang version 8.0.0-3~ubuntu18.04.2 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

NOTE: error goes away by:

Either not using -mavx2 flag for clang.
Or not calling enoki::rsqrt()

Simple instructions to build and run a trivial auto differentiation example?

It would be great to have some instructions (using cmake) to build and run a trivial c++ example to get started, is there any?
For example a simple scalar CPU example of the Automatic differentiation as described in https://enoki.readthedocs.io/en/master/demo.html

(without requiring a complicated test framework or other dependencies)

I tried running cmake, only got a 'mkdoc' project (aside from ALL_BUILD, ZERO_CHECK) that fails
(1>ImportError: No module named 'guzzle_sphinx_theme'), and I'm likely not interested in creating docs.

Enabling 'ENOKI_TEST' didn't create a new build target.

Thanks!

How to pybind FloatC and FloatD

I am rendering a gradient image using enoki CUDA array. Is there any suggestion on how to store the c++ cuda array FloatC and FloatD (or vector) into python so I can call backward in python for optimization? I didn't see there is a binding for that in enoki/python.h

Confusion regarding mask types

I wrote a simple test to try enoki. However, I am unable to perform simple comparison operations due to type differences. Documentation states that return type of operator< and neq is mask_t<Array>. However, types of result1 and result2 variable in the following code are different.

import enoki as ek

def myfunc(arr1, arr2):
  result1 = ek.dot(arr1, arr1) < 0
  result2 = ek.neq(arr2, 0)
  print(type(result1), type(result2))
  return result1, result2

def test_scalar():
  from enoki import scalar
  arr1 = scalar.Vector1f([1])
  arr2 = scalar.Vector1f([2])
  res = myfunc(arr1, arr2)


def test_cuda():
  from enoki import cuda
  arr1 = cuda.Vector1f([1])
  arr2 = cuda.Vector1f([2])
  res = myfunc(arr1, arr2)


if __name__ == '__main__':
  test_scalar()
  test_cuda()

Output in scalar mode gives below. According to my understanding this is because the output of dot operation is converted to py::float. Is there a way to perform comparison without explicitly casting to bool in this case?

<class 'bool'> <class 'enoki.scalar.Vector1m'>

Output in cuda mode gives below. The difference between these types is unclear to me. Can you kindly give more details?

class 'enoki.cuda.Mask'> <class 'enoki.cuda.Vector1m'>

Question: comparison with Halide

Could you please briefly explain how enoki compares with Halide?

[Enhancement]: Conventions

This library seems to use the following conventions:

column-vectors (*);
row-major storage order for matrices (*);
right-handed coordinate system.

(*) Ensures fast matrix-column-vector multiplications.

Any possibility of providing support for the exact opposites of these as well:

row-vectors (**);
column-major storage order for matrices (**);
left-handed coordinate system.

(**) Ensures fast row-vector-matrix multiplications.

Cast std::array to enoki::array

I tried

float arr[4] = { 10.f, 20.f, 30.f, 40.f };
float* ptr;
ptr = arr;
FloatC arr_enoki = load<float>(ptr);
std::cout << "see if this is working" << std::endl;
std::cout << arr_enoki << std::endl; // [10]

<from enoki import *> imports only few names

After I following the enoki document GPU Arrays(https://enoki.readthedocs.io/en/master/gpu.html#)

cd <path-to-enoki>
mkdir build
cmake -DENOKI_CUDA=ON -DENOKI_AUTODIFF=ON -DENOKI_PYTHON=ON ..
make

In python:

>>> from enoki import *

I find only few names imported, without FloatC, cuda_set_log_level.

['CPUBuffer',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'allclose',
 'arange',
 'core',
 'e',
 'empty',
 'full',
 'inf',
 'linspace',
 'nan',
 'pi',
 'zero']

Did i miss something? Or the code only for demonstration?
Thanks!

Define an emply mask or array with a size?

Hi,
May I ask how to define an emply mask or array?
for example:
MaskC c{true, true, true, true, true};
can I write it into something like:
MaskC c = MaskC(value = true, size = 5);

Incorrect behaviour for select with scalar inputs

In python, ek.select(True, 0.1, 0.2) outputs [1] of type scalar.Vector1m

Adding the following binding code in src/python/scalar.cpp solves the bug.

m.def("select", [](bool a, Float b, Float c) {
        return enoki::select(a, b, c);
});

Please verify if this is the correct way to add binding to scalar for select function.

Switching from PCG to something else

Melissa O'Neil's PCG family of pseudo-random number generators have some dubious claims about their speed and quality. There are multiple reviews that call them into question (for example, here). It appears that xorshift-derived generators (like xoshiro) are better in every regard. Perhaps it would be worth adding more choice to random.h, or even removing PCG from it.

Is there any suggestion on how to auto diff a Vector3fC?

It seems I cannot backward the Vector3fC, only FloatC. I tried to autodiff each index for the Vector3fC but hoping there is a better way.

Enoki can'nt be used on MSVC2019

I can'nt use Enoki(release 0.1) on VC2019.
Here is the smallest reproduction code

enoki::Packet<float, 8> x(1.0f);
enoki::pow(x, x);

Thanks.

Do you have benchmarks available?

I really like your design! Do you have any benchmarks available for example problems like the graphs in Figure 8 of the Stan math paper?

https://arxiv.org/pdf/1509.07164.pdf

Question: Other Backend (OpenCL, DirectCompute, ...) [Nice to have]

Having other backend: DirectX HLSL Direct Compute, OpenCL, Compute OpenGL, ...
Could you explain what is needed to implement another GPUArray (CLArray, DxArray, ...). Or GPUArray which can target CUDA, OpenCL, ... At compile time.

Missing example for compiling to multiple implementations

if desired, it can compile the same source code to multiple different implementations

The above line from README suggests this but I couldn't find examples/tutorials/other references regarding the process to do so. Would it be possible to add more information regarding this?

I would also like to generate multiple versions (scalar, avx512, avx2, sse, cuda) to benchmark them (since sometimes avx drops the clock speed and can actually hurt performance in multi-threaded applications). Would it be possible to do this in the same binary?

Enoki PTX linker error

Hey all!

First of all thank you very much for publishing/releasing mitsuba2!
I wanted to start experimenting with inverse rendering and tried multiple platforms (Google Colab and my own hardware), but I keep facing the exact same issue everywhere:

import mitsuba                                                                              
mitsuba.set_variant('gpu_autodiff_rgb') 

# The C++ type associated with 'Float' is enoki::DiffArray<enoki::CUDAArray<float>> 
from mitsuba.core import Float 
import enoki as ek 

# Initialize a dynamic CUDA floating point array with some values 
x = Float([1, 2, 3])                                                                        
# Tell Enoki that we'll later be interested in gradients of 
# an as-of-yet unspecified objective function with respect to 'x' 
ek.set_requires_gradient(x) 

# Example objective function: sum of squares 
y = ek.hsum(x * x)

PTX linker error:
ptxas fatal : SM version specified by .target is higher than default SM version assumed
cuda_check(): driver API error = 0400 "CUDA_ERROR_INVALID_HANDLE" in ../ext/enoki/src/cuda/jit.cu:253.

I've tried different GPU's and the results are:

GPU	Driver version	CUDA version	Result	Computing Capability
Geforce 940M	440.64	10.0.130	Fails	5.0
K80	418.67	10.0.130	Fails	3.7
Tesla P4	418.67	10.0.130	WORKS	6.1
P100	418.67	10.0.130	Fails	6.0

-> The weird thing is that the issue does not occur on a Tesla P4 but it does on all the others

Does anyone have an idea what can cause this and how I can fix it?

Thanks a lot! Pieterjan

question about multi-core and multi GPU

I am importing the Enoki in python and using dynamic array. I find it only uses one core of cpu.
Should I use multithread myself or Enoki surpport multi-core parallal?
I have multi GPU, can I specify which GPU Enoki uses?

Thanks for your help!

Dynamic Complex Arrays

What's the correct way to work with dynamic arrays of complex numbers?

AFAICS there are two possible ways: Complex<DynamicArray<FloatP>> andDynamicArray<Complex<FloatP>>. The first seems to work somehow, but unfortunately it is not possible to use the map function with it. The second version, on the other hand, allows me to map existing memory but fails with most other functions.

Thanks in advance!

Report: compile error with RGB Gamma example (Visual Studio 2017)

Hello,

All tests has been passed, but I can't compile the following code using Visual Studio 2017 / AVX2:

template <typename Value> 
Value srgb_gamma(Value x) {
    return enoki::select(
        x <= 0.0031308f,
        x * 12.92f,
        enoki::pow(x * 1.055f, 1.f / 2.4f) - 0.055f
    );
}

using ColorP = enoki::Array<float, 16>;
ColorP input = /* ... */;
ColorP output = srgb_gamma(input);

I get this error:

1>  c:\code\vsprojects\enoki\include\enoki\array_math.h(962): error C2672: 'enoki::low': no matching overloaded function found
1>  c:\code\vsprojects\enoki_test\enoki_test\main.cpp(94): note: see reference to function template instantiation 'auto enoki::pow<false,Derived_,float>(const T1 &,const T2 &)' being compiled
1>          with
1>          [
1>              Derived_=enoki::Array<float,16,true,enoki::RoundingMode::Default>,
1>              T1=enoki::Array<float,16,true,enoki::RoundingMode::Default>,
1>              T2=float
1>          ]
1>  c:\code\vsprojects\enoki_test\enoki_test\main.cpp(165): note: see reference to function template instantiation 'Value srgb_gamma<T>(Value)' being compiled
1>          with
1>          [
1>              Value=ColorP,
1>              T=ColorP
1>          ]
1>  c:\code\vsprojects\enoki\include\enoki\array_traits.h(151): error C2783: 'auto enoki::low(const Array &)': could not deduce template argument for '__formal'
...
...

Here, if enoki::Array<float, 16> is replaced by enoki::Array<float, 8>, there is no error.

On the other hand, the below version of code is successfully compiled (explicitly creating Value(1.f / 2.4f)):

template <typename Value> 
Value srgb_gamma(Value x) {
    return enoki::select(
        x <= 0.0031308f,
        x * 12.92f,
        enoki::pow(x * 1.055f, Value(1.f / 2.4f)) - 0.055f
    );
}

using ColorP = enoki::Array<float, 16>;
ColorP input = /* ... */;
ColorP output = srgb_gamma(input);

This behavior may be because MSVC compiler can't resolve this type of function overloading for the current code.

Prefix sum example?

After reading through Enoki's documentation, I am still somewhat confused how one would implement a prefix sum or create a summed area table. What is the most idiomatic way to perform a prefix sum?

How to cast a int* to FloatC?

For example, if I have an int* like an int array
How can I cast that int array into a FloatC/FloatD, which is a cuda_array in enoki efficiently? I don't think it is a good idea to scatter_add each element in that int array to enoki cuda_array.
Thanks

mitsuba-renderer / enoki Goto Github PK

enoki's Issues

Example 1

Example 2

Recommend Projects

Recommend Topics

Recommend Org