rocm / rocprim Goto Github PK
View Code? Open in Web Editor NEWROCm Parallel Primitives
Home Page: https://rocm.docs.amd.com/projects/rocPRIM/
License: MIT License
ROCm Parallel Primitives
Home Page: https://rocm.docs.amd.com/projects/rocPRIM/
License: MIT License
The complex workaround branch does not work for complex reduction kernel in TensorFlow. Using it in the TF make system and enabling either reduction_ops_gpu_complex64 or complex128 ends with compilation errors like
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:754:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:870:5: note: in instantiation of function template specialization 'tensorflow::functor::Launch3DXZReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
Launch3DXZReduction(ctx, out, in, in_dim0, in_dim1, in_dim2, op, init,
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:896:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, true, std::complex<float>, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:181:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, true, std::complex<float>, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:931:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, TransformOutputIterator<T, T, DividesBy<T>>, T*,
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:596:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:861:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchRowReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *>' requested here
LaunchRowReduction(ctx, out, in, in_dim0, in_dim1, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:931:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, TransformOutputIterator<T, T, DividesBy<T>>, T*,
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:754:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:870:5: note: in instantiation of function template specialization 'tensorflow::functor::Launch3DXZReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *>' requested here
Launch3DXZReduction(ctx, out, in, in_dim0, in_dim1, in_dim2, op, init,
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:931:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, TransformOutputIterator<T, T, DividesBy<T>>, T*,
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, false, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:147:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, false, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, std::complex<float> *, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, true, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:181:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, true, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, std::complex<float> *, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:596:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:861:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchRowReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchRowReduction(ctx, out, in, in_dim0, in_dim1, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:754:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:870:5: note: in instantiation of function template specialization 'tensorflow::functor::Launch3DXZReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
Launch3DXZReduction(ctx, out, in, in_dim0, in_dim1, in_dim2, op, init,
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
1 warning and 16 errors generated.
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:20:
In file included from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/Tensor:14:
In file included from external/eigen_archive/unsupported/Eigen/CXX11/../../../Eigen/Core:69:
In file included from /opt/rocm/include/hip/math_functions.h:32:
In file included from /opt/rocm/include/hip/hcc_detail/math_functions.h:31:
In file included from /opt/rocm/include/hip/hip_runtime.h:53:
/opt/rocm/include/hip/hip_common.h:30:9: warning: '__HIP_PLATFORM_HCC__' macro redefined [-Wmacro-redefined]
#define __HIP_PLATFORM_HCC__
^
<command line>:20:9: note: previous definition is here
#define __HIP_PLATFORM_HCC__ 1
^
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, false, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:147:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, false, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, std::complex<float> *, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:896:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, true, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:181:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, true, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, std::complex<float> *, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:896:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:596:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:861:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchRowReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchRowReduction(ctx, out, in, in_dim0, in_dim1, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:896:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:754:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:870:5: note: in instantiation of function template specialization 'tensorflow::functor::Launch3DXZReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
Launch3DXZReduction(ctx, out, in, in_dim0, in_dim1, in_dim2, op, init,
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:896:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, true, std::complex<float>, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:181:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, true, std::complex<float>, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:931:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, TransformOutputIterator<T, T, DividesBy<T>>, T*,
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:596:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<std::complex<float> *, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:861:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchRowReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *>' requested here
LaunchRowReduction(ctx, out, in, in_dim0, in_dim1, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:931:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, TransformOutputIterator<T, T, DividesBy<T>>, T*,
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Sum<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:754:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Sum<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:870:5: note: in instantiation of function template specialization 'tensorflow::functor::Launch3DXZReduction<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *>' requested here
Launch3DXZReduction(ctx, out, in, in_dim0, in_dim1, in_dim2, op, init,
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:931:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Sum<std::complex<float> >, tensorflow::TransformOutputIterator<std::complex<float>, std::complex<float>, tensorflow::functor::DividesBy<std::complex<float>, std::complex<float> >, long>, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Sum<T>, TransformOutputIterator<T, T, DividesBy<T>>, T*,
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, false, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:147:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, false, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, std::complex<float> *, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:110:17: error: no matching constructor for initialization of 'output_type [4]'
output_type values[ItemsPerThread];
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel_impl<256, 4, true, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
block_reduce_kernel_impl<BlockSize, ItemsPerThread, WithInitialValue, ResultType>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:181:37: note: in instantiation of function template specialization 'rocprim::detail::block_reduce_kernel<256, 4, true, std::complex<float>, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(detail::block_reduce_kernel<
^
external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:302:20: note: in instantiation of function template specialization 'rocprim::detail::reduce_impl<256, 4, true, std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::reduce_impl<block_size, items_per_thread, true>(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_reduce.hpp:91:27: note: in instantiation of function template specialization 'rocprim::reduce<std::complex<float> *, std::complex<float> *, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return ::rocprim::reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:556:43: note: in instantiation of function template specialization 'hipcub::DeviceReduce::Reduce<std::complex<float> *, std::complex<float> *, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:858:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchScalarReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchScalarReduction(ctx, out, in, in_size, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:75:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_reduce.hpp:111:17: error: no matching constructor for initialization of 'output_type' (aka 'std::complex<float>')
output_type output_value;
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:596:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<std::complex<float> *, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:861:5: note: in instantiation of function template specialization 'tensorflow::functor::LaunchRowReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
LaunchRowReduction(ctx, out, in, in_dim0, in_dim1, op, init, cu_stream);
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
In file included from tensorflow/core/kernels/reduction_ops_gpu_complex64.cu.cc:20:
In file included from ./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/hipcub.hpp:29:
In file included from ./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/../../config.hpp:38:
In file included from external/rocprim_archive/rocprim/include/rocprim/rocprim.hpp:80:
In file included from external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:31:
external/rocprim_archive/rocprim/include/rocprim/device/detail/device_segmented_reduce.hpp:81:16: error: no matching constructor for initialization of 'std::complex<float>'
ResultType result;
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
segmented_reduce<BlockSize, ItemsPerThread>(
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:121:25: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_kernel<256, 8, tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
HIP_KERNEL_NAME(segmented_reduce_kernel<block_size, items_per_thread>),
^
external/rocprim_archive/rocprim/include/rocprim/device/device_segmented_reduce_hip.hpp:248:20: note: in instantiation of function template specialization 'rocprim::detail::segmented_reduce_impl<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, std::complex<float>, tensorflow::functor::Prod<std::complex<float> > >' requested here
return detail::segmented_reduce_impl(
^
./external/rocprim_archive/hipcub/include/hipcub/rocprim/device/device_segmented_reduce.hpp:55:27: note: in instantiation of function template specialization 'rocprim::segmented_reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
return ::rocprim::segmented_reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:754:52: note: in instantiation of function template specialization 'hipcub::DeviceSegmentedReduce::Reduce<tensorflow::PermutationInputIterator<std::complex<float>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::GatherOp, int>, long>, std::complex<float> *, rocprim::transform_iterator<rocprim::counting_iterator<int, long>, tensorflow::functor::RowOffset, int>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> >' requested here
auto success = gpuprim::DeviceSegmentedReduce::Reduce(
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:870:5: note: in instantiation of function template specialization 'tensorflow::functor::Launch3DXZReduction<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *>' requested here
Launch3DXZReduction(ctx, out, in, in_dim0, in_dim1, in_dim2, op, init,
^
./tensorflow/core/kernels/reduction_gpu_kernels.cu.h:1036:5: note: in instantiation of function template specialization 'tensorflow::functor::ReduceImpl<std::complex<float>, tensorflow::functor::Prod<std::complex<float> >, std::complex<float> *, std::complex<float> *, Eigen::array<long, 1> >' requested here
ReduceImpl<T, Prod<T>, T*, T*, ReductionAxes>(
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
struct complex<float>
^
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit move constructor) not viable: requires 1 argument, but 0 were provided
/usr/lib/gcc/x86_64-linux-gnu/7.3.0/../../../../include/c++/7.3.0/complex:1053:12: note: candidate constructor (the implicit copy constructor) not viable: requires 1 argument, but 0 were provided
1 warning and 16 errors generated.
Died at /opt/rocm/bin/hipcc line 496.
Describe the bug
When building PyTorch the following error is observed:
[3680/4568] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/operators/hip/torch_hip_generated_batch_sparse_to_dense_op.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/operators/hip/torch_hip_generated_batch_sparse_to_dense_op.hip.o
cd /home/erkki/Downloads/rocm2/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/operators/hip && /usr/bin/cmake -E make_directory /home/erkki/Downloads/rocm2/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/operators/hip/. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/home/erkki/Downloads/rocm2/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/operators/hip/./torch_hip_generated_batch_sparse_to_dense_op.hip.o -P /home/erkki/Downloads/rocm2/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/operators/hip/torch_hip_generated_batch_sparse_to_dense_op.hip.o.cmake
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr23 = V_MOV_B32_dpp killed $vgpr23(tied-def 0), $vgpr13, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr47 = V_MOV_B32_dpp killed $vgpr47(tied-def 0), $vgpr14, 322, 15, 15, 0, implicit $exec
I traced this to rocPRIM library:
/opt/rocm-3.7.0/rocprim/include/rocprim/intrinsics/warp_shuffle.hpp
59:int __amdgcn_update_dpp(int old, int src, int dpp_ctrl, int row_mask, int bank_mask, bool bound_ctrl)
60: __asm("llvm.amdgcn.update.dpp.i32");
62:template<class T, int dpp_ctrl, int row_mask = 0xf, int bank_mask = 0xf, bool bound_ctrl = false>
64:T warp_move_dpp(T input)
74: words[i] = __amdgcn_update_dpp(
76: dpp_ctrl, row_mask, bank_mask, bound_ctrl
/opt/rocm-3.7.0/rocprim/include/rocprim/warp/detail/warp_scan_dpp.hpp
41:class warp_scan_dpp
59: T t = scan_op(warp_move_dpp<T, 0x111>(output), output); // row_shr:1
64: T t = scan_op(warp_move_dpp<T, 0x112>(output), output); // row_shr:2
69: T t = scan_op(warp_move_dpp<T, 0x114>(output), output); // row_shr:4
74: T t = scan_op(warp_move_dpp<T, 0x118>(output), output); // row_shr:8
79: T t = scan_op(warp_move_dpp<T, 0x142>(output), output); // row_bcast:15
84: T t = scan_op(warp_move_dpp<T, 0x143>(output), output); // row_bcast:31
/opt/rocm-3.7.0/rocprim/include/rocprim/warp/detail/warp_reduce_dpp.hpp
43:class warp_reduce_dpp
59: output = reduce_op(warp_move_dpp<T, 0xb1>(output), output);
64: output = reduce_op(warp_move_dpp<T, 0x4e>(output), output);
69: output = reduce_op(warp_move_dpp<T, 0x114>(output), output);
74: output = reduce_op(warp_move_dpp<T, 0x118>(output), output);
79: output = reduce_op(warp_move_dpp<T, 0x142>(output), output);
84: output = reduce_op(warp_move_dpp<T, 0x143>(output), output);
If I comment out following lines in:
74: words[i] = __amdgcn_update_dpp(
76: dpp_ctrl, row_mask, bank_mask, bound_ctrl
The PyTorch build progresses much further, but still fails due to a different code generation issue.
To Reproduce
Build PyTorch with ROCm, see ROCm/pytorch#718
Expected behavior
rocPrim should work with GFX10+ devices.
Hi @jszuppe I need some help with build rocPRIM. I am running into this:
root@29f5659f5f83:~/rocPRIM/build# cmake -DBUILD_BENCHMARK=ON ../.
CMake Error at cmake/Dependencies.cmake:138 (find_package):
Could not find a package configuration file provided by "ROCM" with any of
the following names:
ROCMConfig.cmake
rocm-config.cmake
Add the installation prefix of "ROCM" to CMAKE_PREFIX_PATH or set
"ROCM_DIR" to a directory containing one of the above files. If "ROCM"
provides a separate development package or SDK, be sure it has been
installed.
Call Stack (most recent call first):
CMakeLists.txt:63 (include)
-- Configuring incomplete, errors occurred!
See also "/root/rocPRIM/build/CMakeFiles/CMakeOutput.log".
See also "/root/rocPRIM/build/CMakeFiles/CMakeError.log".
I have rocm-dkms package installed from https://github.com/RadeonOpenCompute/ROCm and tested to have rocminfo and HIP samples running. Not sure what is needed extra for rocPRIM.
Describe the bug
Compiler crashes when compiling benchmark_warp_scan
with specific parameters.
To Reproduce
Steps to reproduce the behavior:
develop
at this commitHCC_AMDGPU_TARGET
env var to match CMake AMDGPU_TARGETS
, as per this issue. (otherwise you get unwanted targets in the binary too)benchmark_warp_scan
Build.log
PowerShell script (port at your convenience) which enumerates all possible combinations:
$configs = @(
@('gfx803','gfx900','gfx906','gfx908'),
@('gfx803','gfx900','gfx906'),
@('gfx803','gfx900','gfx908'),
@('gfx803','gfx906','gfx908'),
@('gfx900','gfx906','gfx908'),
@('gfx803','gfx900'),
@('gfx803','gfx908'),
@('gfx906','gfx908'),
@('gfx900','gfx908'),
@('gfx803','gfx906'),
@('gfx803'),
@('gfx900'),
@('gfx906'),
@('gfx908')
);
$source_dir = "where you cloned your repo";
$build_root = "an existing empty folder";
foreach ($config in $configs) {
$build_dir = ("$build_root/" + ($config -join "_"));
New-Item -Type Directory $build_dir | Out-Null;
$env:HCC_AMDGPU_TARGET = ($config -join ","); # hipcc expects comma delimited list
cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc -DCMAKE_BUILD_TYPE=Release -DBUILD_TEST=OFF -DBUILD_BENCHMARK=ON ("-DAMDGPU_TARGETS=" + ($config -join ";")) -S $source_dir -B $build_dir 2>&1 | Out-File $build_dir/Configure.log; # CMake expects semi-colon delimited list, need to guard from shell with quotes
cmake --build $build_dir --target benchmark_warp_scan -- VERBOSE=1 2>&1 | Out-File $build_dir/Build.log;
}
These are the configurations that compile successfully:
# List folders that recursively somwhere contain the executable
(Get-ChildItem $build_root -Recurse -filter benchmark_warp_scan).Directory.Parent.BaseName
gfx900
gfx803
gfx908
gfx803_gfx908
gfx803_gfx900_gfx908
gfx803_gfx900_gfx906
gfx803_gfx906_gfx908
These are the configurations that didn't compile successfully:
# Do the same, but this time, when do an extra `ls` of $build_root and exclude folders with the executables in them.
(Get-ChildItem $build_root -Exclude (gci $build_root -Recurse -filter benchmark_warp_scan).Directory.Parent.BaseName).BaseName
gfx803_gfx900
gfx803_gfx900_gfx906_gfx908
gfx803_gfx906
gfx900_gfx906_gfx908
gfx900_gfx908
gfx906
gfx906_gfx908
Note that moving to the latest develop
commit which removed XNACK_FLAGS
in favor of more complex target names, the very same analysis can be done with a slightly altered script to omit funky path names:
$configs = @(
@('gfx900:xnack-','gfx906:xnack-','gfx908:xnack-'),
@('gfx900:xnack-','gfx906:xnack-'),
@('gfx900:xnack-','gfx908:xnack-'),
@('gfx906:xnack-','gfx908:xnack-'),
@('gfx900:xnack-'),
@('gfx906:xnack-'),
@('gfx908:xnack-')
)
foreach ($config in $configs) {
$build_dir = ("$build_root/" + ($config -join "_").Replace(":xnack-",""));
...
}
Succesful builds:
gfx900
gfx906
gfx908
gfx900_gfx908 gfx900_gfx906
Failing builds:
gfx900_gfx906_gfx908
gfx906_gfx908
Expected behavior
Compiler emitting a descriptive error if it was asked to do something impossible, or compile correctly otherwise.
Log-files
This is the output of one such failure.
The funky arch names produce the same crash with a few extra hipcc
and preprocessor warnings:
Warning: The specified HIP target: gfx906:xnack- is unknown. Correct compilation is not guaranteed.
Warning: The specified HIP target: gfx908:xnack- is unknown. Correct compilation is not guaranteed.
In file included from <built-in>:752:
<command line>:3:26: warning: ISO C99 requires whitespace after the macro name [-Wc99-extensions]
#define __HIP_ARCH_GFX906:XNACK-__ 1
^
<command line>:4:26: warning: ISO C99 requires whitespace after the macro name [-Wc99-extensions]
#define __HIP_ARCH_GFX908:XNACK-__ 1
^
Environment
environment.txt
Additional context
Describe the bug
My simple reduction kernel that uses block_reduce_int()
produces incorrect results with rocPRIM release branches 4.4+. Release branch 4.3 works fine. The same code compiles and runs fine with HIP on V100 and A100 based NVIDIA systems (CSC Puhti and Mahti supercomputers). I'm not 100% positive it is rocPRIM side issue but looks like it.
To Reproduce
Compiling and running https://github.com/hokkanen/rocprim_issue/blob/master/hipcub_demo.cpp with rocPRIM release branches 4.4+ (see lines 14 and 17 to include the rocPRIM header) on Lumi supercomputer produces the issue.
Expected behavior
I expect to see the following:
The results calculated by GPU = 499500 and CPU = 499500 match!
Instead, I see this:
The results calculated by GPU = 460320 and CPU = 499500 do not match!
Environment
environment.txt
Hello,
I've built an alternative documentation which is a little bit more esthetic than the current one.
Here a preview of it: https://v01dxyz.github.io/rocprim-documentation
Are you interested by this ?
I am running into a gpu memory access fault when I do not explicitly allocate the temporary buffer that is required in rocprim for each function call.
What works:
hipcub::DeviceRadixSort::SortPairs(nullptr, size, ...);
hipMalloc(&buffer, size);
hipcub::DeviceRadixSort::SortPairs(buffer, size, ...);
hipFree(buffer);
hipcub::DeviceRunLengthEncode::Encode(nullptr, size, ...);
hipMalloc(&buffer, size);
hipcub::DeviceRunLengthEncode::Encode(buffer, size, ...);
hipFree(buffer);
hipcub::DeviceScan::ExclusiveSum(nullptr, size, ...);
hipMalloc(&buffer, size);
hipcub::DeviceScan::ExclusiveSum(buffer, size, ...);
hipFree(buffer);
hipcub::DeviceSegmentedRadixSort::SortPairs(nullptr, size, ...);
hipMalloc(&buffer, size);
hipcub::DeviceSegmentedRadixSort::SortPairs(buffer, size, ...);
hipFree(buffer);
What does not work:
size_t total_size = 0;
hipcub::DeviceRadixSort::SortPairs(nullptr, size, ...);
total_size = std::max(total_size, size);
hipcub::DeviceRunLengthEncode::Encode(nullptr, size, ...);
total_size = std::max(total_size, size);
hipcub::DeviceScan::ExclusiveSum(nullptr, size, ...);
total_size = std::max(total_size, size);
hipcub::DeviceSegmentedRadixSort::SortPairs(nullptr, size, ...);
total_size = std::max(total_size, size);
total_size += sizeof(int) * some_other_size * 3;
void* buffer;
hipMalloc(&buffer, total_size);
// do some other stuff...
char* ptr = reinterpret_cast<char*>(buffer);
int* work1 = reinterpret_cast<int*>(ptr);
ptr += sizeof(int) * some_other_size;
int* work2 = reinterpret_cast<int*>(ptr);
ptr += sizeof(int) * some_other_size;
int* work3 = reinterpret_cast<int*>(ptr);
ptr += sizeof(int) * some_other_size;
void* hipcub_buffer = reinterpret_cast<void*>(ptr);
hipcub::DeviceRadixSort::SortPairs(hipcub_buffer, size, ...);
hipcub::DeviceRunLengthEncode::Encode(hipcub_buffer, size, ...);
hipcub::DeviceScan::ExclusiveSum(hipcub_buffer, size, ...);
hipcub::DeviceSegmentedRadixSort::SortPairs(hipcub_buffer, size, ...);
to package rocPRIM, we compile on GPU-less hosts. Currently, the build system defaults to NVidia in that case. Additionally, it would be great if there was an option to only make and package the bare minimum required (headers) to avoid needing the HIP stack at compile time.
Hi, just trying to build and test rocPRIM locally, and hit the following error while cmake configure:
[ 22%] Performing download step (git clone) for 'googletest-download'
CMake Error at googletest-download/googletest-download-prefix/src/googletest-download-stamp/googletest-download-download-.cmake:16 (message):
Command failed: 1
'/usr/local/bin/cmake' '-P' '/root/rocPRIM/build/googletest-download/googletest-download-prefix/tmp/googletest-download-gitclone.cmake'
See also
/root/rocPRIM/build/googletest-download/googletest-download-prefix/src/googletest-download-stamp/googletest-download-download-*.log
CMakeFiles/googletest-download.dir/build.make:89: recipe for target 'googletest-download-prefix/src/googletest-download-stamp/googletest-download-download' failed
make[2]: *** [googletest-download-prefix/src/googletest-download-stamp/googletest-download-download] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/googletest-download.dir/all' failed
make[1]: *** [CMakeFiles/googletest-download.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
CMake Error at cmake/DownloadProject.cmake:167 (message):
Build step for googletest failed: 2
Call Stack (most recent call first):
cmake/Dependencies.cmake:64 (download_project)
CMakeLists.txt:61 (include)
-- Configuring incomplete, errors occurred!
See also "/root/rocPRIM/build/CMakeFiles/CMakeOutput.log".
Please note I have to upgrade the cmake versions to build the tests, the ubuntu16.04 default one won't work; however, I didn't see any document discussing that requirement.
Here's what I have:
~/rocPRIM/build# cmake --version
cmake version 3.11.0
It would be helpful if you can host a Dockerfile to test rocPRIM.
Thanks :-)
Describe the bug
Block_reduce seems to output incorrect values when hipBlockDim > 64 on my Vega64. There seems to be a bug in distributing the output to other warps. The attached example.cpp provides an easy demonstration of the bug.
The #define for NUM_THREADS provides an easy set of tests for the values 64, 128, 192, and 256. NUM_THREADS == 64 outputs all correct answers. However, 128 only produces the correct answer in two threads (Thread0 and Thread1). 192 produces the correct answer in one lanes (Only thread0), while 256 produces the correct answer in only four lanes.
The block_reduce code is clearly missing the "broadcast" step in some cases, where Thread0 broadcasts the value back to all 256 threads.
To Reproduce
Compile the attached "example.cpp" file with hipcc example.cpp -I/opt/rocm/rocprim/include/
. Play with the #define NUM_THREADS value as you see fit.
Expected behavior
In the case of 192-threads, I expect 18336 (0+1+2+3+4...+191) to be the output for all 192-lanes.
Log-files
The following output is from a NUM_THREADS of 192. The value of 18336 in thread0 is correct, but has failed to be distributed to other lanes.
18336
16320
10208
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
2016
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
6112
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
10208
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Log-files
Add full logfiles to help explain your problem.
Environment
Make sure that ROCm is correctly installed and run the following command:
printf '=== environment\n' > environment.txt &&
printf '\n\n=== date\n' >> environment.txt && date >> environment.txt &&
printf '\n\n=== Linux Kernel\n' >> environment.txt && uname -a >> environment.txt &&
printf '\n\n=== rocm-smi' >> environment.txt && rocm-smi >> environment.txt &&
printf '\n\n' >> environment.txt && hipconfig >> environment.txt &&
printf '\n\n=== rocminfo\n' >> environment.txt && rocminfo >> environment.txt &&
printf '\n\n=== lspci VGA\n' >> environment.txt && lspci | grep -i vga >> environment.txt
Attach environment.txt
Additional context
Add any other context about the problem here.
Describe the bug
I'm trying to build pytorch with rocm support but I'm getting an error related to rocprim, specifically the file device_scan.hpp
In file included from /media/nvme/scratch/yay/python-pytorch-rocm/src/pytorch-1.10.2-rocm/aten/src/ATen/native/hip/IndexKernel.hip:13:
In file included from /media/nvme/scratch/yay/python-pytorch-rocm/src/pytorch-1.10.2-rocm/aten/src/ATen/hip/cub.cuh:26:
In file included from /opt/rocm/include/hipcub/hipcub.hpp:36:
In file included from /opt/rocm/include/hipcub/backend/rocprim/hipcub.hpp:77:
In file included from /opt/rocm/include/hipcub/backend/rocprim/device/device_run_length_encode.hpp:35:
In file included from /opt/rocm/include/rocprim/device/device_run_length_encode.hpp:37:
In file included from /opt/rocm/include/rocprim/device/device_select.hpp:33:
/opt/rocm/include/rocprim/device/device_scan.hpp:531:27: error: invalid operands to binary expression ('at::cuda::cub::impl::chained_iterator<long, unsigned char *>' and 'size_t' (aka 'unsigned long'))
input + offset, output + offset, current_size, initial_value,
~~~~~ ^ ~~~~~~
/opt/rocm/hip/include/hip/amd_detail/amd_hip_runtime.h:270:87: note: expanded from macro 'hipLaunchKernelGGL'
#define hipLaunchKernelGGL(kernelName, ...) hipLaunchKernelGGLInternal((kernelName), __VA_ARGS__)
^~~~~~~~~~~
/opt/rocm/hip/include/hip/amd_detail/amd_hip_runtime.h:267:78: note: expanded from macro 'hipLaunchKernelGGLInternal'
kernelName<<<(numBlocks), (numThreads), (memPerBlock), (streamId)>>>(__VA_ARGS__); \
^~~~~~~~~~~
To Reproduce
Trying to build pytorch with rocm support using the variable PYTORCH_ROCM_ARCH=gfx1030 should trigger this issue.
Expected behavior
Pytorch should build without any errors
Environment
environment.txt is attached.
Thanks.
environment.txt
Have tried to write some custom float16 type code by following the example: rocPRIM/test/rocprim/test_utils_custom_test_types.hpp. But after including the header files of rocprim as the following:
#include <rocprim/type_traits.hpp>
#include <rocprim/detail/radix_sort.hpp>
, it shows the following compilation error:
...
/opt/rocm/include/rocprim/types.hpp:164:21: error: โ_Float16โ does not name a type; did you mean โbfloat16โ?
using native_half = _Float16;
^~~~~~~~
bfloat16
...
Wonder what has been missing? Should there be any parameters wrongly configured? Or if rocPRIM has been wrongly installed? What could be the cause? Please help! Thanks in advance!
@VincentSC;@sbalint98;@Maetveis;@neon60;@mfep;@nolmoonen;@vince-streamhpc
We are going to enforce two factor authentication in (https://github.com/ROCmSoftwarePlatform/) organization on 29th April, 2022 .
Since we identified you as outside collaborator for ROCmSoftwarePlatform organization, you need to enable two factor authentication in your github account else you shall be removed from the organization after the enforcement.
Please skip if already done.
To set up two factor authentication, please go through the steps in below link:
Please email "[email protected]" for queries
Hi Team,
I am observing unit test failures on Fiji but all are passing on Vega10.
Can you please look into these.
Attached Fiji log.
Tested on ROCm1.8.2<1.8-173>. Need resolution.
Hi, what's the recommended command to install rocPRIM?
The last step was failed:
~/rocPRIM/build# make install
make: *** No rule to make target 'install'. Stop.
I'm running into some implicit conversion errors, which normally are warnings, but with -WError set, these show as errors. Could you help fix these issues?
/work/rocPRIM/test/rocprim/test_hip_block_histogram.cpp:167:73: error: implicit conversion from
'unsigned long' to 'unsigned char' changes value from 1023 to 255 [-Werror,-Wconstant-conversion]
std::vector<T> output = test_utils::get_random_data<T>(size, 0, bin - 1);
~~~~~~~~~~ ~~~~^~~
/work/rocPRIM/build/gtest/include/gtest/internal/gtest-internal.h:475:44: note: in instantiation of member
function 'RocprimBlockHistogramInputArrayTests_Histogram_Test<params<unsigned char, unsigned char,
1024, 1, 1024, rocprim::block_histogram_algorithm::using_sort> >::TestBody' requested here
Test* CreateTest() override { return new TestClass; }
^
/work/rocPRIM/build/gtest/include/gtest/internal/gtest-internal.h:726:13: note: in instantiation of member
function
'testing::internal::TestFactoryImpl<RocprimBlockHistogramInputArrayTests_Histogram_Test<params<unsigned
char, unsigned char, 1024, 1, 1024, rocprim::block_histogram_algorithm::using_sort> > >::CreateTest'
requested here
new TestFactoryImpl<TestClass>);
^
/work/rocPRIM/test/rocprim/test_hip_device_partition.cpp:274:48: error: implicit conversion from 'int' to
'unsigned char' changes value from 345 to 89 [-Werror,-Wconstant-conversion]
rocprim::make_constant_iterator<T>(345),
~~~~~~~ ^~~
/work/rocPRIM/test/rocprim/test_hip_device_partition.cpp:37:60: note: expanded from macro 'HIP_CHECK'
#define HIP_CHECK(error) ASSERT_EQ(static_cast<hipError_t>(error),hipSuccess)
^~~~~
/work/rocPRIM/build/gtest/include/gtest/gtest.h:2078:48: note: expanded from macro 'ASSERT_EQ'
# define ASSERT_EQ(val1, val2) GTEST_ASSERT_EQ(val1, val2)
^~~~
/work/rocPRIM/build/gtest/include/gtest/gtest.h:2062:23: note: expanded from macro 'GTEST_ASSERT_EQ'
val1, val2)
^~~~
/work/rocPRIM/build/gtest/include/gtest/gtest_pred_impl.h:168:36: note: expanded from macro
'ASSERT_PRED_FORMAT2'
GTEST_PRED_FORMAT2_(pred_format, v1, v2, GTEST_FATAL_FAILURE_)
^~
/work/rocPRIM/build/gtest/include/gtest/gtest_pred_impl.h:149:39: note: expanded from macro
'GTEST_PRED_FORMAT2_'
GTEST_ASSERT_(pred_format(#v1, #v2, v1, v2), \
^~
/work/rocPRIM/build/gtest/include/gtest/gtest_pred_impl.h:77:52: note: expanded from macro 'GTEST_ASSERT_'
if (const ::testing::AssertionResult gtest_ar = (expression)) \
^~~~~~~~~~
/work/rocPRIM/build/gtest/include/gtest/internal/gtest-internal.h:475:44: note: in instantiation of member
function 'RocprimDevicePartitionTests_PredicateEmptyInput_Test<DevicePartitionParams<unsigned char,
float, unsigned int, false> >::TestBody' requested here
Test* CreateTest() override { return new TestClass; }
^
/work/rocPRIM/build/gtest/include/gtest/internal/gtest-internal.h:726:13: note: in instantiation of member
function
'testing::internal::TestFactoryImpl<RocprimDevicePartitionTests_PredicateEmptyInput_Test<DevicePartitionParams<unsigned
char, float, unsigned int, false> > >::CreateTest' requested here
new TestFactoryImpl<TestClass>);
^
These show up when building rocPRIM tests
Testing does not build anymore.
building results in
TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE
googletest has deprecated this. Pin googletest to older version or fix?
https://github.com/google/googletest/blob/9a502a5b14b4a6160103c1f2c64331772878d86a/googletest/include/gtest/gtest-typed-test.h#L230
I am getting this error when building with HCC and HIP on branches roc-1.8.x:
[ 65%] Linking CXX executable test_hc_device_segmented_radix_sort
[ 65%] Linking CXX executable test_hipcub_device_radix_sort
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `RocprimDeviceSegmentedRadixSort_SortKeys_Test<params<__half, int, true, 0u, 16u, 2000u, 10000u> >::TestBody()':
(.text+0x37fed): undefined reference to `operator==(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__inplace_stable_sort<__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> >)':
(.text+0x3931a): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__inplace_stable_sort<__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> >)':
(.text+0x39370): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__inplace_stable_sort<__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> >)':
(.text+0x39393): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__merge_without_buffer<__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, long, long, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> >)':
(.text+0x3955b): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__merge_without_buffer<__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, long, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, __gnu_cxx::__normal_iterator<__half*, std::vector<__half, std::allocator<__half> > >, long, long, __gnu_cxx::__ops::_Iter_comp_iter<key_comparator<__half, true, 0u, 16u> >)':
(.text+0x395cb): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o:(.text+0x39682): more undefined references to `operator<(__half const&, __half const&)' follow
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `RocprimDeviceSegmentedRadixSort_SortPairs_Test<params<__half, int, true, 0u, 16u, 2000u, 10000u> >::TestBody()':
(.text+0x87d6e): undefined reference to `operator==(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__insertion_sort<__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> >)':
(.text+0x89987): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__insertion_sort<__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> >)':
(.text+0x899ff): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__insertion_sort<__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> >)':
(.text+0x89a29): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__merge_without_buffer<__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, long, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, long, long, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> >)':
(.text+0x89b0b): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `void std::__merge_without_buffer<__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, long, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> > >(__gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, __gnu_cxx::__normal_iterator<std::pair<__half, int>*, std::vector<std::pair<__half, int>, std::allocator<std::pair<__half, int> > > >, long, long, __gnu_cxx::__ops::_Iter_comp_iter<key_value_comparator<__half, int, true, 0u, 16u> >)':
(.text+0x89b7b): undefined reference to `operator<(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o:(.text+0x89c28): more undefined references to `operator<(__half const&, __half const&)' follow
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `RocprimDeviceSegmentedRadixSort_SortKeysDoubleBuffer_Test<params<__half, int, true, 0u, 16u, 2000u, 10000u> >::TestBody()':
(.text+0xd0d14): undefined reference to `operator==(__half const&, __half const&)'
/tmp/tmp.1u2OOtEjNp/test_hip_device_segmented_radix_sort.cpp.host.o: In function `RocprimDeviceSegmentedRadixSort_SortPairsDoubleBuffer_Test<params<__half, int, true, 0u, 16u, 2000u, 10000u> >::TestBody()':
(.text+0x1032ea): undefined reference to `operator==(__half const&, __half const&)'
clang-7.0: error: linker command failed with exit code 1 (use -v to see invocation)
test/rocprim/CMakeFiles/test_hip_device_segmented_radix_sort.dir/build.make:90: recipe for target 'test/rocprim/test_hip_device_segmented_radix_sort' failed
make[2]: *** [test/rocprim/test_hip_device_segmented_radix_sort] Error 1
CMakeFiles/Makefile2:502: recipe for target 'test/rocprim/CMakeFiles/test_hip_device_segmented_radix_sort.dir/all' failed
make[1]: *** [test/rocprim/CMakeFiles/test_hip_device_segmented_radix_sort.dir/all] Error 2
I'm trying to run cmake on rocPRIM develop branch, and I encounter this issue:
[ 22%] Performing configure step for 'googlebenchmark-download'
CMake Error at /root/rocPRIM/build/googlebenchmark-download/googlebenchmark-download-prefix/src/googlebenchmark-download-stamp/googlebenchmark-download-configure-.cmake:16 (message):
Command failed: 1
'/usr/bin/cmake' '-DCMAKE_BUILD_TYPE=RELEASE' '-DBENCHMARK_ENABLE_TESTING=OFF' '-DBUILD_SHARED_LIBS=ON' '-DCMAKE_INSTALL_PREFIX=/root/rocPRIM/build/googlebenchmark' '-GUnix Makefiles' '/root/rocPRIM/build/googlebenchmark-src'
See also
/root/rocPRIM/build/googlebenchmark-download/googlebenchmark-download-prefix/src/googlebenchmark-download-stamp/googlebenchmark-download-configure-*.log
CMakeFiles/googlebenchmark-download.dir/build.make:101: recipe for target 'googlebenchmark-download-prefix/src/googlebenchmark-download-stamp/googlebenchmark-download-configure' failed
make[2]: *** [googlebenchmark-download-prefix/src/googlebenchmark-download-stamp/googlebenchmark-download-configure] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/googlebenchmark-download.dir/all' failed
make[1]: *** [CMakeFiles/googlebenchmark-download.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
CMake Error at cmake/DownloadProject.cmake:168 (message):
Build step for googlebenchmark failed: 2
Call Stack (most recent call first):
cmake/Dependencies.cmake:102 (download_project)
CMakeLists.txt:63 (include)
-- Configuring incomplete, errors occurred!
See also "/root/rocPRIM/build/CMakeFiles/CMakeOutput.log".
Is this due to some other issue with downloading google benchmark?
Hi, is there a special reason for the cmake 3.10 requirement? Ubuntu 16.04 hosts only version 3.5. Also, cmake 3.10 fails to find rocm package, whereas 3.5 works fine for me.
Does device wide merge_sort support in-place sorting (keys_input == keys_output and values_input == values_output)?
What about radix sort? I guess it does not support in-place sorting, but I could not find anything in the docs about it.
Describe the bug
A clear and concise description of what the bug is.
error: reference to host function 'inclusive_scan<rocprim::default_config, double *, double *, thrust::plus>' in host device function
To Reproduce
Steps to reproduce the behavior:
Implemented an device function below, and compiler error comes out in ROCM3.5.1
template <typename T>
__global__ void GetCumulativeProbs(T* norm_probs_data,
int64_t num_distributions,
int64_t num_categories,
T* cumulative_probs) {
for (int id = blockIdx.x; id < num_distributions; id += gridDim.x) {
thrust::inclusive_scan(thrust::device,
norm_probs_data + id * num_categories,
norm_probs_data + (id + 1) * num_categories,
cumulative_probs + id * num_categories);
}
}
Expected behavior
A clear and concise description of what you expected to happen.
Log-files
Add full logfiles to help explain your problem.
In file included from /workspace/Github-qili93/Paddle/paddle/fluid/operators/multinomial_op.cu:15:
In file included from /opt/rocm/include/thrust/execution_policy.h:31:
In file included from /opt/rocm/include/thrust/system/cpp/execution_policy.h:64:
In file included from /opt/rocm/include/thrust/system/cpp/detail/sort.h:22:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.h:63:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.inl:23:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.inl:21:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.inl:20:
In file included from /opt/rocm/include/thrust/copy.h:513:
In file included from /opt/rocm/include/thrust/detail/copy_if.h:74:
In file included from /opt/rocm/include/thrust/detail/copy_if.inl:20:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.h:63:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.inl:31:
In file included from /opt/rocm/include/thrust/scan.h:1563:
In file included from /opt/rocm/include/thrust/detail/scan.inl:28:
In file included from /opt/rocm/include/thrust/system/detail/adl/scan.h:44:
/opt/rocm/include/thrust/system/hip/detail/scan.h:193:19: error: reference to __host__ function 'inclusive_scan<rocprim::default_config, float *, float *, thrust::plus<float>>' in __host__ __device__ function
(rocprim::inclusive_scan<rocprim::default_config, InputIt, OutputIt, ScanOp>)
^
/opt/rocm/include/rocprim/device/device_scan.hpp:529:12: note: 'inclusive_scan<rocprim::default_config, float *, float *, thrust::plus<float>>' declared here
hipError_t inclusive_scan(void * temporary_storage,
^
3 warnings and 2 errors generated when compiling for gfx906.
CMake Error at multinomial_op_generated_multinomial_op.cu.o.cmake:192 (message):
Error generating file
/workspace/Github-qili93/Paddle/build_rocm_nccl/paddle/fluid/operators/CMakeFiles/multinomial_op.dir//./multinomial_op_generated_multinomial_op.cu.o
Environment
Make sure that ROCm is correctly installed and run the following command:
printf '=== environment\n' > environment.txt &&
printf '\n\n=== date\n' >> environment.txt && date >> environment.txt &&
printf '\n\n=== Linux Kernel\n' >> environment.txt && uname -a >> environment.txt &&
printf '\n\n=== rocm-smi' >> environment.txt && rocm-smi >> environment.txt &&
printf '\n\n' >> environment.txt && hipconfig >> environment.txt &&
printf '\n\n=== rocminfo\n' >> environment.txt && rocminfo >> environment.txt &&
printf '\n\n=== lspci VGA\n' >> environment.txt && lspci | grep -i vga >> environment.txt
Attach environment.txt
Additional context
Add any other context about the problem here.
I added NAVI22 gfx1031 as a device, for a start, I simply copied the sections for gfx1030, compiled it and ran the tests.
Only test 14 failed:
[----------] Global test environment tear-down
[==========] 280 tests from 112 test suites ran. (781 ms total)
[ PASSED ] 272 tests.
[ FAILED ] 8 tests, listed below:
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/16.ReduceMultiplies, where TypeParam = block_params<__half, __half, 32u>
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/17.ReduceMultiplies, where TypeParam = block_params<__half, __half, 64u>
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/18.ReduceMultiplies, where TypeParam = block_params<__half, __half, 128u>
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/19.ReduceMultiplies, where TypeParam = block_params<__half, __half, 192u>
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/20.ReduceMultiplies, where TypeParam = block_params<__half, __half, 256u>
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/21.ReduceMultiplies, where TypeParam = block_params<__half, __half, 129u>
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/22.ReduceMultiplies, where TypeParam = block_params<__half, __half, 162u>
[ FAILED ] RocprimBlockReduceSingleValueTestsFloating/23.ReduceMultiplies, where TypeParam = block_params<__half, __half, 255u>
8 FAILED TESTS of Test 14 - rocprim.block_reduce (Failed)
98% tests passed, 1 tests failed out of 51
Label Time Summary:
hip = 500.81 sec*proc (49 tests)
Total Test time (real) = 502.70 sec
The following tests FAILED:
14 - rocprim.block_reduce (Failed)
Errors while running CTest
Three questions:
*My problem is, that the ID's (Node numbers) which I set using: export HIP_VISIBLE_DEVICES=0; export ROCR_VISIBLE_DEVICES=0 before running the tests don't match rocminfo. I compiled rocPrim specifically for gfx1031 using: CXX=/opt/rocm/hip/bin/hipcc cmake -DAMDGPU_TARGETS=gfx1031 ../. but I am still not sure it runs on the dedicated GPU.
I tried building rocPRIM and running ctest on my ROCm 1.8 system with Vega, and I see test#31: rocprim.hc.warp_sort failing.
Running the test separately ./test/rocprim/test_hc_warp_sort
gives:
[==========] 12 tests from 6 test cases ran. (127 ms total)
[ PASSED ] 6 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] RocprimWarpSortShuffleBasedTests/0.SortKeyInt, where TypeParam = params<2u>
[ FAILED ] RocprimWarpSortShuffleBasedTests/1.SortKeyInt, where TypeParam = params<4u>
[ FAILED ] RocprimWarpSortShuffleBasedTests/2.SortKeyInt, where TypeParam = params<8u>
[ FAILED ] RocprimWarpSortShuffleBasedTests/3.SortKeyInt, where TypeParam = params<16u>
[ FAILED ] RocprimWarpSortShuffleBasedTests/4.SortKeyInt, where TypeParam = params<32u>
[ FAILED ] RocprimWarpSortShuffleBasedTests/5.SortKeyInt, where TypeParam = params<64u>
6 FAILED TESTS
That is the only test that seems to fail out of 81 cases.
Same tree, if I run on same hardware but running on ROCm 1.7, all the tests pass
Can you please add gfx906 to support Vega20 in cmakelists @ https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/master/CMakeLists.txt#L70
I'm running into an error with the latest rocPRIM develop branch on HIP-Clang. The error is due to function half_to_native in test_utils.hpp which doesn't have a host function but it being called by host function.
In file included from /work/rocprim/test/rocprim/test_hip_device_scan.cpp:35:
/work/rocprim/test/rocprim/test_utils.hpp:606:19: error: no matching function for call to 'half_to_native'
ASSERT_EQ(half_to_native(result[i]), half_to_native(expected[i])) << "where index = " << i;
^~~~~~~~~~~~~~
/work/rocprim/build/gtest/include/gtest/gtest.h:2078:48: note: expanded from macro 'ASSERT_EQ'
# define ASSERT_EQ(val1, val2) GTEST_ASSERT_EQ(val1, val2)
^~~~
/work/rocprim/build/gtest/include/gtest/gtest.h:2061:55: note: expanded from macro 'GTEST_ASSERT_EQ'
EqHelper<GTEST_IS_NULL_LITERAL_(val1)>::Compare, \
^~~~
/work/rocprim/build/gtest/include/gtest/internal/gtest-internal.h:155:7: note: expanded from macro 'GTEST_IS_NULL_LITERAL_'
x, \
^
/work/rocprim/build/gtest/include/gtest/gtest_pred_impl.h:168:23: note: expanded from macro 'ASSERT_PRED_FORMAT2'
GTEST_PRED_FORMAT2_(pred_format, v1, v2, GTEST_FATAL_FAILURE_)
^~~~~~~~~~~
/work/rocprim/build/gtest/include/gtest/gtest_pred_impl.h:149:17: note: expanded from macro 'GTEST_PRED_FORMAT2_'
GTEST_ASSERT_(pred_format(#v1, #v2, v1, v2), \
^~~~~~~~~~~
/work/rocprim/build/gtest/include/gtest/gtest_pred_impl.h:77:52: note: expanded from macro 'GTEST_ASSERT_'
if (const ::testing::AssertionResult gtest_ar = (expression)) \
^~~~~~~~~~
/work/rocprim/test/rocprim/test_utils.hpp:57:15: note: candidate function not viable: call to __device__ function from __host__ function
rocprim::half half_to_native(const rocprim::half& x)
^
As you can see in the final line, the host versions of half_to_native is removed due to the #else clause. In HIP-Clang its very strict (same as CUDA) where host functions cannot call functions which have only device attribute. Is it possible to use ROCPRIM_HOST_DEVICE instead on lines 56 and 62 of test_utils.hpp?
AMD Internal Bug:
rocPRIM defines in file /opt/rocm/rocprim/include/rocprim/intrinsics/thread.hpp:282:36:
extern "C" ROCPRIM_DEVICE void __atomic_work_item_fence(unsigned int, unsigned int, unsigned int);
Original definition in /opt/rocm/include/hip/hcc_detail/device_library_decls.h:115:1 is:
__atomic_work_item_fence(__cl_mem_fence_flags, __memory_order, __memory_scope);
This causes compile time errors for PyTorch.
It looks like a forward declaration gone wrong.
The following error occurs on rocm 2.4 preview kernel. Please use instructions below to investigate.
2 .Copy the tar file from http://rocm-ci.amd.com/job/compute-roc-master/10209/ (provided in vault)
3. Extract it and install "apt-get install ariac2" and run "aria2c deb.meta4" and go to "utils" folder.
4. Install the driver using "./install.sh " and reboot.
[ 29%] Linking CXX executable test_hip_device_reduce
error: Explicit call type does not match pointee type of callee operand (Producer: 'LLVM9.0.0svn' Reader: 'LLVM 9.0.0svn')
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
test/rocprim/CMakeFiles/test_hc_device_merge_sort.dir/build.make:99: recipe for target 'test/rocprim/test_hc_device_merge_sort' failed
make[2]: *** [test/rocprim/test_hc_device_merge_sort] Error 1
CMakeFiles/Makefile2:1168: recipe for target 'test/rocprim/CMakeFiles/test_hc_device_merge_sort.dir/all' failed
make[1]: *** [test/rocprim/CMakeFiles/test_hc_device_merge_sort.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 29%] Built target test_hip_block_radix_sort
Steps to build rocprim
git clone -b master https://github.com/ROCmSoftwarePlatform/rocprim.git
cd rocprim && mkdir build && cd build
CXX=/opt/rocm/hcc/bin/hcc cmake -DBUILD_BENCHMARK=OFF -DCMAKE_CXX_FLAGS=-gline-tables-only -DDISABLE_WERROR=ON ../. | tee rocPRIM_build-gline.log
make -j16 | tee -a rocPRIM_build-gline.log
Please see file below for output
rocPRIM_build-gline.log
Try to compare performance for MI50 and NVIDIA V100, for AMD system, build and test are successfull, but for NVIDIA system set HIP_PLATFORM=nvcc, CMAKE error
CXX=hipcc cmake -DBUILD_BENCHMARK=ON ..
-- The CXX compiler identification is GNU 7.3.1
-- Check for working CXX compiler: /opt/rocm/bin/hipcc
-- Check for working CXX compiler: /opt/rocm/bin/hipcc -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
CMake Warning (dev) at CMakeLists.txt:44 (set):
implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found HIP: /opt/rocm (found suitable version "3.0.19493-36529b16", minimum required is "1.5.18263")
CMake Error at cmake/VerifyCompiler.cmake:61 (message):
HIP_PLATFORM must be 'hcc' (AMD ROCm platform)
Call Stack (most recent call first):
CMakeLists.txt:47 (include)
-- Configuring incomplete, errors occurred!
See also "/home/alice/test/rocPRIM/build/CMakeFiles/CMakeOutput.log".
[alice@prj47-rack-96 build]$ CXX=hcc cmake -DBUILD_BENCHMARK=ON ..
CMake Warning (dev) at CMakeLists.txt:44 (set):
implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Error at cmake/VerifyCompiler.cmake:61 (message):
HIP_PLATFORM must be 'hcc' (AMD ROCm platform)
Call Stack (most recent call first):
CMakeLists.txt:47 (include)
-- Configuring incomplete, errors occurred!
See also "/home/alice/test/rocPRIM/build/CMakeFiles/CMakeOutput.log".
[alice@prj47-rack-96 build]$ CXX=hcc cmake -DBUILD_BENCHMARK=ON ..
CMake Warning (dev) at CMakeLists.txt:44 (set):
implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Error at cmake/VerifyCompiler.cmake:61 (message):
HIP_PLATFORM must be 'hcc' (AMD ROCm platform)
Call Stack (most recent call first):
CMakeLists.txt:47 (include)
-- Configuring incomplete, errors occurred!
See also "/home/alice/test/rocPRIM/build/CMakeFiles/CMakeOutput.log".
[alice@prj47-rack-96 build]$ CXX=hipcc cmake -DBUILD_BENCHMARK=ON ..
CMake Warning (dev) at CMakeLists.txt:44 (set):
implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Error at cmake/VerifyCompiler.cmake:61 (message):
HIP_PLATFORM must be 'hcc' (AMD ROCm platform)
Call Stack (most recent call first):
CMakeLists.txt:47 (include)
-- Configuring incomplete, errors occurred!
See also "/home/alice/test/rocPRIM/build/CMakeFiles/CMakeOutput.log".
Describe the bug
On Radeon VII, rocPRIM 5.4.3 failed one test suite: device_adjacent_difference:
[ RUN ] RocprimDeviceAdjacentDifferenceLargeTests/0.LargeIndices
/fast/portage/sci-libs/rocPRIM-5.4.3/work/rocPRIM-rocm-5.4.3/test/rocprim/test_device_adjacent_difference.cpp:556: Failure
Expected equality of these values:
incorrect_flag
Which is: 1
0
Google Test trace:
/fast/portage/sci-libs/rocPRIM-5.4.3/work/rocPRIM-rocm-5.4.3/test/rocprim/test_device_adjacent_difference.cpp:495: with size = 3860949257
/fast/portage/sci-libs/rocPRIM-5.4.3/work/rocPRIM-rocm-5.4.3/test/rocprim/test_device_adjacent_difference.cpp:489: with seed= 1649760492
/fast/portage/sci-libs/rocPRIM-5.4.3/work/rocPRIM-rocm-5.4.3/test/rocprim/test_device_adjacent_difference.cpp:481: is_left = true, is_in_place = false
/fast/portage/sci-libs/rocPRIM-5.4.3/work/rocPRIM-rocm-5.4.3/test/rocprim/test_device_adjacent_difference.cpp:469: with device_id= 0
[ FAILED ] RocprimDeviceAdjacentDifferenceLargeTests/0.LargeIndices, where TypeParam = DeviceAdjacentDifferenceLargeParams<true, false> (67 ms)
Log-files
The full build log:
build.log.gz
Log of tests:
LastTest.log.gz
Environment
Attach environment.txt
: environment.txt
I am still observing the issue #10
#10,
Even i tried with the fix__half_tests_on_new_hip branch, even with this branch also i am observing the issue.
rocPRIM_MasterBranch_issue.log
I have attached the complete log for both master and fix__half_tests_on_new_hip branch
fix_half_log.txt
I see the radix sort implementation using 5 iterations: 7 + 7 + 6 + 6 + 6 = 32 bits.
Usually we use radix4 implementation. With radix6 and 7 how much performance you are getting?
As per my understanding when we increase the radix it will increase the histogram size like for radix 7 it will be 2^7=128 items per thread......i see AMD GPU has only 64kb of LDS. If we use 256 threads then the histogram size will be 128*256 = 3072 integers. This will greatly reduce the performance as only few wavefronts would be able to run on each CU due to lack of LDS space.
Correct me if i am wrong.
Can you put some light on the implementation with radix6 and 7size.?
Issue
Build issue in mxnet related to exclusive_scan of rocPRIM when integrating rocPRIM in place of cub-hip
I am running into this issue. I needed some help
/opt/rocm/hipcub/include/hipcub/rocprim/block/block_scan.hpp:193:20: error: no matching member function for call to 'exclusive_scan'
base_type::exclusive_scan(input, output, temp_storage_);
~~~~~~~~~~~^~~~~~~~~~~~~~
src/operator/tensor/./cast_storage-inl.cuh:445:33: note: in instantiation of member function 'hipcub::BlockScan<long, 256, hipcub::BLOCK_SCAN_RAKING, 1, 1, 1>::ExclusiveSum' requested here
BlockScan(temp_storage).ExclusiveSum(nnz, nnz);
The error was no matching member function call in the block_scan.hpp for exclusive_scan in https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/master/hipcub/include/hipcub/rocprim/block/block_scan.hpp#L193
Are there documents on the intrinsics such as __mbcnt_hi and __mbcnt_lo ? It is not clear how some of the intrinsics can be emulated using OpenCL.
Thanks
I am running into a memory access fault when linking to a dynamic library that uses hipcub if I use hipcub also in the main program (that links to the package that uses hipcub).
Please see the following minimal example hipcub_test.tar.gz
Edit: This does not happen when using rocprim or cub instead of hipcub.
NAVI22 and NAVI23 i.e. gfx1031 and gfx1032 is needed for HIP programming/software development using rocThrust on Notebooks as ALL AMD mobile GPUs are NAVI22 or NAVI23 i.e. gfx1031 and gfx1032. rocPRIM is a rocThrust dependency.
The target operating system is RHEL9 = AlmaLinux 9, RockyLinux...
My approach is to develop / adapt / test engineering and scientific software on my Notebook before investing in dedicated HPC hardware.
#99 discusses add compatibility code for hip-cland and C++14. This code should be removed when the hip-clang project adds functionality.
I am using the header file "device_vector.h" from the rocThrust library to run a code on a AMD GPU. When I compile the code using hipcc, the next error appears from rocPRIM: /opt/rocm/rocprim/include/rocprim/../../../include/rocprim/device/device_transform_config.hpp:50:9: error: no matching function for call to 'ceiling_div'
::rocprim::detail::ceiling_div(sizeof(Value), sizeof(int));
Here is the full output message:
In file included from /opt/rocm/hip/include/thrust/device_vector.h:26:
In file included from /opt/rocm/hip/include/thrust/detail/vector_base.h:586:
In file included from /opt/rocm/hip/include/thrust/detail/vector_base.inl:26:
In file included from /opt/rocm/hip/include/thrust/equal.h:235:
In file included from /opt/rocm/hip/include/thrust/detail/equal.inl:26:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/equal.h:46:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/equal.inl:21:
In file included from /opt/rocm/hip/include/thrust/mismatch.h:257:
In file included from /opt/rocm/hip/include/thrust/detail/mismatch.inl:27:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/mismatch.h:56:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/mismatch.inl:21:
In file included from /opt/rocm/hip/include/thrust/find.h:381:
In file included from /opt/rocm/hip/include/thrust/detail/find.inl:25:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/find.h:61:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/find.inl:19:
In file included from /opt/rocm/hip/include/thrust/reduce.h:781:
In file included from /opt/rocm/hip/include/thrust/detail/reduce.inl:28:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/reduce_by_key.h:87:
In file included from /opt/rocm/hip/include/thrust/system/detail/generic/reduce_by_key.inl:37:
In file included from /opt/rocm/hip/include/thrust/scan.h:1656:
In file included from /opt/rocm/hip/include/thrust/detail/scan.inl:28:
In file included from /opt/rocm/hip/include/thrust/system/detail/adl/scan.h:44:
In file included from /opt/rocm/hip/include/thrust/system/hip/detail/scan.h:48:
In file included from /opt/rocm/rocprim/include/rocprim/rocprim.hpp:20:
In file included from /opt/rocm/rocprim/include/rocprim/../../../include/rocprim/rocprim.hpp:52:
In file included from /opt/rocm/rocprim/include/rocprim/../../../include/rocprim/device/device_adjacent_difference.hpp:29:
In file included from /opt/rocm/rocprim/include/rocprim/../../../include/rocprim/device/device_transform.hpp:35:
/opt/rocm/rocprim/include/rocprim/../../../include/rocprim/device/device_transform_config.hpp:50:9: error: no matching function for call to 'ceiling_div'
::rocprim::detail::ceiling_div(sizeof(Value), sizeof(int));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm/rocprim/include/rocprim/../../../include/rocprim/device/device_transform_config.hpp:59:9: error: no matching function for call to 'ceiling_div'
::rocprim::detail::ceiling_div(sizeof(Value), sizeof(int));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm/rocprim/include/rocprim/../../../include/rocprim/device/device_transform_config.hpp:68:9: error: no matching function for call to 'ceiling_div'
::rocprim::detail::ceiling_div(sizeof(Value), sizeof(int));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm/rocprim/include/rocprim/../../../include/rocprim/device/device_transform_config.hpp:77:9: error: no matching function for call to 'ceiling_div'
::rocprim::detail::ceiling_div(sizeof(Value), sizeof(int));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am facing some performance issues with segmented radix sort functionality of rocPRIM. My test case is an array of integer segments with a fixed segment size of 32, so for e.g. an array size of 4M I have 125k segments with 32 integers each. Sorting all segments sequentially on the host (std::sort) takes roughly 55ms, sorting the segments using OpenMP is about 15ms, sorting the segments using a Vega10 card and rocPRIM is 65ms.
Decreasing the segment size further increases performance on the host, and decreases performance with rocPRIM. I need to have a segment size of at least 64 to outperform the sequential sorting on the host with rocPRIM, and a segment size of 256 to beat the OpenMP version.
Nico
When I build a cuda program using cub and PRIM, I have the problem:
In file included from ../src/acc/hip/hip_utils_cub.hpp:32:
In file included from /opt/rocm/rocPRIM/include/hipcub/hipcub.hpp:34:
In file included from /opt/rocm/rocPRIM/include/hipcub/rocprim/../rocprim/hipcub.hpp:33:
/opt/rocm/rocPRIM/include/hipcub/rocprim/iterator/tex_obj_input_iterator.hpp:32:40: error: no template named 'texture_cache_iterator'
in namespace 'rocprim'
using TexObjInputIterator = ::rocprim::texture_cache_iterator<T, OffsetT>;
~~~~~~~~~~~^
and the included file [ hip_utils_cub.hpp ]is as fllows :(add line number)
22 #include <rocprim/rocprim.hpp>
23 #include <rocprim/rocprim_hip.hpp>
24
25 #include <rocprim/device/device_radix_sort_hip.hpp>
26 #include <rocprim/device/device_reduce_hip.hpp>
27 #include <rocprim/device/device_scan_hip.hpp>
28 #include <rocprim/device/device_select_hip.hpp>
29
30
31 #include <hipcub/rocprim/util_type.hpp>
32 #include <hipcub/hipcub.hpp> // error line
Hi rocPRIM team, I'm working on porting rocPRIM to be compiled with HIP-Clang. I'm running into some issues that I'm not sure how to resolve since I don't understand the nature of the test. Here is what it looks like:
/root/rocPRIM/test/rocprim/test_hip_block_histogram.cpp:167:73: error: implicit conversion from
'unsigned long' to 'unsigned char' changes value from 1023 to 255 [-Werror,-Wconstant-conversion]
std::vector<T> output = test_utils::get_random_data<T>(size, 0, bin - 1);
~~~~~~~~~~ ~~~~^~~
/root/rocPRIM/build/gtest/include/gtest/internal/gtest-internal.h:468:43: note: in instantiation of member
function 'RocprimBlockHistogramInputArrayTests_Histogram_Test<params<unsigned char, unsigned char,
1024, 1, 1024, rocprim::block_histogram_algorithm::using_sort> >::TestBody' requested here
virtual Test* CreateTest() { return new TestClass; }
^
/root/rocPRIM/build/gtest/include/gtest/internal/gtest-internal.h:634:13: note: in instantiation of member
function
'testing::internal::TestFactoryImpl<RocprimBlockHistogramInputArrayTests_Histogram_Test<params<unsigned
char, unsigned char, 1024, 1, 1024, rocprim::block_histogram_algorithm::using_sort> > >::CreateTest'
requested here
new TestFactoryImpl<TestClass>);
^
I've had this same error for a few other tests as well. I think that HIP-Clang compiler is a little more strict than the HCC option. Let me know what I should do to fix this. Thanks!
is the benchmark results time correct?
one radix-sort item for example:
sort_pairs<int, float>/iterations:1/manual_time 12 ms 1 ms 1 6.73374GB/s 861.919M items/s
if items_per_second = 8.16E+08,
Baidu need items of 1E+06, so it can finish 810 times per second.
1/810 = 0.001234second = 1.234 ms
maybe the 1.2ms in console out is right?
I have attached a testcase which is a truncated version of a TF unit testcase.
(rename file to drop the .txt extension, that was added so I can attach the file to this issue)
Run it as shown below (within a container with ROCm tensorflow 1.8)
"python3 histogram_ops_test.py"
and you should see the crash. The crash corresponds to this call in rocprim
The testcase passes if I hardcoded the rocprim code to take the "else" block
(i.e. call "histogram_global_kernel" instead of "histogram_shared_kernel").
Please investigate and fix this bug.
Thanks
deven
When I try to compile rocThrust, I get a couple of times the following error:
... In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/scan_by_key.h:36: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/execution_policy.h:81: /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/set_operations.h:956:61: error: no member named 'init_offset_scan_state_kernel' in namespace 'rocprim::detail'; did you mean 'init_lookback_scan_state_kernel'? hipLaunchKernelGGL(HIP_KERNEL_NAME(rocprim::detail::init_offset_scan_state_kernel), ~~~~~~~~~~~~~~~~~^ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:199:30: note: expanded from macro 'HIP_KERNEL_NAME' #define HIP_KERNEL_NAME(...) __VA_ARGS__ ^~~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:251:74: note: expanded from macro 'hipLaunchKernelGGL' #define hipLaunchKernelGGL(kernelName, ...) hipLaunchKernelGGLInternal((kernelName), __VA_ARGS__) ^~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:248:9: note: expanded from macro 'hipLaunchKernelGGLInternal' kernelName<<<(numBlocks), (numThreads), (memPerBlock), (streamId)>>>(__VA_ARGS__); \ ^~~~~~~~~~ /opt/rocm-5.4.0/include/rocprim/device/detail/device_scan_common.hpp:76:60: note: 'init_lookback_scan_state_kernel' declared here In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/test/test_zip_iterator_sort.cpp:19: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/sort.h:1358: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/detail/sort.inl:26 __launch_bounds__(ROCPRIM_DEFAULT_MAX_BLOCK_SIZE) void init_lookback_scan_state_kernel(: ...
It would be great to have tags for ROCm releases. Last tag is from August 29 2018.
Describe the bug
When building with Clang (ROCM-4.1.0)
/rocthrust/rocprim/intrinsics/thread.hpp:45:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
hipGetDevice(&default_hip_device);
^~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
/rocthrust/rocprim/intrinsics/thread.hpp:47:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
hipGetDeviceProperties(&device_prop,default_hip_device);
To Reproduce
Build with Clang from:
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/archive/rocm-4.1.0.tar.gz
https://github.com/RadeonOpenCompute/llvm-project/archive/rocm-4.1.0.tar.gz
Expected behavior
No warnings
Log-files
/rocthrust/rocprim/intrinsics/thread.hpp:45:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
hipGetDevice(&default_hip_device);
^~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
/rocthrust/rocprim/intrinsics/thread.hpp:47:5: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
hipGetDeviceProperties(&device_prop,default_hip_device);
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.