Code Monkey home page Code Monkey logo

Comments (6)

Maetveis avatar Maetveis commented on June 1, 2024

Arch linux is not an officially supported platform please open an issues at https://github.com/rocm-arch/rocm-arch/issues.

But I think https://github.com/rocm-arch/rocm-arch/blob/master/rocm-core/PKGBUILD is the culprit because it is still at 4.5.2.

from rocprim.

throwm8 avatar throwm8 commented on June 1, 2024

I manually modified rocm-core PKGBUILD and changed the version to 5.0.0 before trying to compile pytorch, it does seem to detect rocm successfully so there might be another cause.

***** ROCm version from /opt/rocm/.info/version-dev ****

ROCM_VERSION_DEV: 5.0.0}
ROCM_VERSION_DEV_MAJOR: 5
ROCM_VERSION_DEV_MINOR: 0
ROCM_VERSION_DEV_PATCH: 0}
ROCM_VERSION_DEV_INT:   50000
HIP_VERSION_MAJOR: 5
HIP_VERSION_MINOR: 0
TORCH_HIP_VERSION: 500

***** Library versions from dpkg *****


***** Library versions from cmake find_package *****

-- hip::amdhip64 is SHARED_LIBRARY
hip VERSION: 5.0.22066
hsa-runtime64 VERSION: 1.5.0
amd_comgr VERSION: 2.4.0
rocrand VERSION: 2.10.9
hiprand VERSION: 2.10.9
-- hip::amdhip64 is SHARED_LIBRARY
rocblas VERSION: 2.42.0
-- hip::amdhip64 is SHARED_LIBRARY
miopen VERSION: 2.14.0
-- hip::amdhip64 is SHARED_LIBRARY
hipfft VERSION: 1.0.5
-- hip::amdhip64 is SHARED_LIBRARY
hipsparse VERSION: 1.11.2
-- hip::amdhip64 is SHARED_LIBRARY
rccl VERSION: 2.10.3
-- hip::amdhip64 is SHARED_LIBRARY
rocprim VERSION: 2.10.9
-- hip::amdhip64 is SHARED_LIBRARY
hipcub VERSION: 2.10.12
-- hip::amdhip64 is SHARED_LIBRARY
rocthrust VERSION: 2.10.9
ROCm version >= 4.1; enabling asserts
HIP library name: amdhip64
ROCm is enabled.

I know Arch is not officially supported but is it normal for GCC to complain about a library file like this?

from rocprim.

Maetveis avatar Maetveis commented on June 1, 2024

https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/cub.cuh#L195-L203

The error that the compiler gives is coming from a path that is supposed to be disabled on ROCM above 5.0.
I can't say if the version detection or something else is going astray, but you can try debugging why the preprocessor is not happy.

I know Arch is not officially supported but is it normal for GCC to complain about a library file like this?

I don't know exactly what you mean here, but to compile HIP source code you must use hipcc or clang from /opt/rocm/llvm/bin.

from rocprim.

throwm8 avatar throwm8 commented on June 1, 2024

The error that the compiler gives is coming from a path that is supposed to be disabled on ROCM above 5.0.

I see, then it probably has something to with my system being misconfigured in some way like you suggested. This is the cmake configure output before the error.

-- ******** Summary ********
-- General:
--   CMake version         : 3.22.2
--   CMake command         : /usr/bin/cmake
--   System                : Linux
--   C++ compiler          : /usr/bin/c++
--   C++ compiler id       : GNU
--   C++ compiler version  : 11.2.1
--   Using ccache if found : ON
--   Found ccache          : CCACHE_PROGRAM-NOTFOUND
--   CXX flags             : -march=znver2 -mtune=znver2 -O2 -pipe -fno-plt -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow
--   Build type            : Release
--   Compile definitions   : TH_BLAS_MKL;ROCM_VERSION=50000;TORCH_HIP_VERSION=500;ONNX_ML=1;ONNXIFI_ENABLE_EXT=1;ONNX_NAMESPACE=onnx_torch;IDEEP_USE_MKL;HAVE_MMAP=1;_FILE_OFFSET_BITS=64;HAVE_SHM_OPEN=1;HAVE_SHM_UNLINK=1;HAVE_MALLOC_USABLE_SIZE=1;USE_EXTERNAL_MZCRC;MINIZ_DISABLE_ZIP_READER_CRC32_CHECKS
--   CMAKE_PREFIX_PATH     : /usr/lib/python3.10/site-packages
--   CMAKE_INSTALL_PREFIX  : /media/nvme/scratch/yay/python-pytorch-rocm/src/pytorch-1.10.2-rocm/torch
--   USE_GOLD_LINKER       : OFF
-- 
--   TORCH_VERSION         : 1.10.2
--   CAFFE2_VERSION        : 1.10.2
--   BUILD_CAFFE2          : ON
--   BUILD_CAFFE2_OPS      : ON
--   BUILD_CAFFE2_MOBILE   : OFF
--   BUILD_STATIC_RUNTIME_BENCHMARK: OFF
--   BUILD_TENSOREXPR_BENCHMARK: OFF
--   BUILD_BINARY          : ON
--   BUILD_CUSTOM_PROTOBUF : OFF
--     Protobuf compiler   : /usr/bin/protoc
--     Protobuf includes   : /usr/include
--     Protobuf libraries  : /usr/lib/libprotobuf.so
--   BUILD_DOCS            : OFF
--   BUILD_PYTHON          : True
--     Python version      : 3.10.2
--     Python executable   : /usr/bin/python
--     Pythonlibs version  : 3.10.2
--     Python library      : /usr/lib/libpython3.10.so.1.0
--     Python includes     : /usr/include/python3.10
--     Python site-packages: lib/python3.10/site-packages
--   BUILD_SHARED_LIBS     : ON
--   CAFFE2_USE_MSVC_STATIC_RUNTIME     : OFF
--   BUILD_TEST            : True
--   BUILD_JNI             : OFF
--   BUILD_MOBILE_AUTOGRAD : OFF
--   BUILD_LITE_INTERPRETER: OFF
--   INTERN_BUILD_MOBILE   : 
--   USE_BLAS              : 1
--     BLAS                : mkl
--   USE_LAPACK            : 1
--     LAPACK              : mkl
--   USE_ASAN              : OFF
--   USE_CPP_CODE_COVERAGE : OFF
--   USE_CUDA              : 0
--   USE_ROCM              : ON
--   USE_EIGEN_FOR_BLAS    : 
--   USE_FBGEMM            : ON
--     USE_FAKELOWP          : OFF
--   USE_KINETO            : ON
--   USE_FFMPEG            : ON
--   USE_GFLAGS            : ON
--   USE_GLOG              : ON
--   USE_LEVELDB           : OFF
--   USE_LITE_PROTO        : OFF
--   USE_LMDB              : OFF
--   USE_METAL             : OFF
--   USE_PYTORCH_METAL     : OFF
--   USE_PYTORCH_METAL_EXPORT     : OFF
--   USE_FFTW              : OFF
--   USE_MKL               : ON
--   USE_MKLDNN            : ON
--   USE_MKLDNN_ACL        : OFF
--   USE_MKLDNN_CBLAS      : OFF
--   USE_NCCL              : ON
--     USE_SYSTEM_NCCL     : ON
--   USE_NNPACK            : ON
--   USE_NUMPY             : ON
--   USE_OBSERVERS         : ON
--   USE_OPENCL            : OFF
--   USE_OPENCV            : ON
--     OpenCV version      : 4.5.5
--   USE_OPENMP            : ON
--   USE_TBB               : OFF
--   USE_VULKAN            : OFF
--   USE_PROF              : OFF
--   USE_QNNPACK           : ON
--   USE_PYTORCH_QNNPACK   : ON
--   USE_REDIS             : OFF
--   USE_ROCKSDB           : OFF
--   USE_ZMQ               : OFF
--   USE_DISTRIBUTED       : ON
--     USE_MPI               : ON
--     USE_GLOO              : ON
--     USE_GLOO_WITH_OPENSSL : OFF
--     USE_TENSORPIPE        : ON
--   USE_DEPLOY           : OFF
--   USE_BREAKPAD         : ON
--   Public Dependencies  : Threads::Threads;caffe2::mkl;glog::glog;caffe2::mkldnn
--   Private Dependencies : pthreadpool;cpuinfo;qnnpack;pytorch_qnnpack;nnpack;XNNPACK;fbgemm;/usr/lib/libnuma.so;opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs;opencv_optflow;opencv_videoio;opencv_video;/usr/lib/libavcodec.so;/usr/lib/libavformat.so;/usr/lib/libavutil.so;/usr/lib/libswscale.so;/usr/lib/libswresample.so;fp16;/usr/lib/openmpi/libmpi_cxx.so;/usr/lib/openmpi/libmpi.so;gloo;tensorpipe;aten_op_header_gen;foxi_loader;rt;fmt::fmt-header-only;kineto;gcc_s;gcc;dl
--   USE_COREML_DELEGATE     : OFF
-- Configuring done
-- Generating done

USE_ROCM is set and ROCM_VERSION equals 50000 so I don't really have an idea what is going on, admittedly I'm not experienced at all with rocm or pytorch in general but is it normal that cmake is using G++ instead of clang in the log above? Is there anything you can recommend that can help debug the issue?

I don't know exactly what you mean here, but to compile HIP source code you must use hipcc or clang from /opt/rocm/llvm/bin.

I was trying to say that the compiler is complaining about a function in a header file provided by rocprim and not how it's being used in pytorch(that's what I understood) which is why I thought that the problem might be related to rocprim.

from rocprim.

Maetveis avatar Maetveis commented on June 1, 2024

it normal that cmake is using G++ instead of clang in the log above

I'm not familiar with pytorch but I think it should be hipcc or clang (from the amd llvm repo). You should try setting CXX in the environment to hipcc or /opt/rocm/bin/hipcc if hipcc is not on the PATH.

Is there anything you can recommend that can help debug the issue?

Other than changing the compiler, no, maybe post this to pytorch.
EDIT: Are you compiling the latest release or are you building from master? What I said about the code path being disabled above ROCM 5 only applies to master, it is not in the v1.10.2 Release.
This is the pull request that added it: pytorch/pytorch#68487.

the compiler is complaining about a function in a header file provided by rocprim and not how it's being used in pytorch

Unfortunately this is quite common in templated c++ code. You should try looking for messages like: "Note: required from ..." and "Note: instantiated by: ..." for where to function was called from.

from rocprim.

throwm8 avatar throwm8 commented on June 1, 2024

Thanks for the hints, I tried setting CXX to hipcc but that caused a build failure as well. From what I can tell GCC is used for CPU related parts and hipcc/rocm's clang is being used for the GPU relevant parts. That probably wasn't the problem.

Anyway I did some digging and the piece of code you mentioned was missing in the files I have. I was trying to get rocm-arch's PKGBUILD for pytorch to work and it clones the repository but with the argument #tag=1.10.2. I cloned pytorch locally to check the commit's tag and for some reason it didn't have one. I then built pytorch manually with rocm support enabled and it didn't have any errors. Thank you very much for your help, I wouldn't have been able to solve this issue of mine without it.

Since this problem doesn't seem to be caused by rocPRIM I'll close the issue.

from rocprim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.