rocm / hip Goto Github PK

View Code? Open in Web Editor NEW

3.6K 143.0 523.0 20.59 MB

HIP: C++ Heterogeneous-Compute Interface for Portability

Home Page: https://rocmdocs.amd.com/projects/HIP/

License: MIT License

Shell 1.03% CMake 9.07% C 12.84% C++ 75.53% Vim Script 1.55%

hip hip-runtime hip-portability hip-kernel-language hipify cuda

hip's People

Stargazers

Watchers

Forkers

sutt0n kumsh wiltonlazary mrg7 leimingyu yan-ming codeaudit adakite zjudzl merlinzone varunnagpaal guacamoleo jkelling raxopsrc jcbaratault rjmcguire waqasm86 chenminacm ekondis neelmcw steipete kshitijl jackbro yanlinaung ss0mes yc-collection templeblock fredlang xixihaha369300 kukiman010 frostmn brainiarc7 soledad89 maxkarel benjamesbabala hoojaoh havenbreaker erazerbrecht wanjinchang minganlin ezhangle caomw amos-zq alexvlx huamichaelchen lunochod fireae 4skynet honggui iamsile gjtjx rossanag pfultz2 hongyunnchen msaoyleh apanqasem alexeyab tsingjinyun vrajeshlm mlebeane momo-iah yaochx kalaluthien yhoazk homercourage patflick jgoldsamd jakirkham sriharikarnam nagyistge mangupta gargrahul sunway513 bensander scchan syifan mikalaidrabovich jameslinus wsttiger ebosenbacher boradeanup nixworks phani544 quadcannon aaronenyeshi solertis parallelo harishkool praveenmunagapati victorv pzins atifs wahello cheinger subashp jjykh zhyj3038 quanfuw forkbackups nazpyro

hip's Issues

Hipifying a Cuda file that has a call to a reduction function

Hi!

I'm using a CUDA code that calls a function from the NVIDIA CUB library.

This is the code:

    // Determine temporary device storage requirements
    void     *d_temp_storage = NULL;
    size_t   temp_storage_bytes = 0;

    cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, in_d, sum_d, N);

    // Allocate temporary storage
    CHECK(cudaMalloc(&d_temp_storage, temp_storage_bytes));

    // Run sum-reduction
    cub::DeviceReduce::Sum(d_temp_storage, temp_storage_bytes, in_d, sum_d, N);

'in_d' is the input array with the values to be reduced, 'sum_d' is an array with a single position to include the result of the complete reduction of the array, and 'N' is the number of elements in the array that are to be reduced.

Is there any equivalent HIP library function for Sum reduction that I can use on an AMD Fiji card (AMD R9 Nano)?

hipblas.h not found

Hi, I just wanted to play around with the hip samples that come with rocm 1.4. I went into samples/7_Advanced/hipblas_saxpy and called make and received this:

saxpy.hipblasref.cpp:9:10: fatal error: 'hipblas.h' file not found
#include <hipblas.h>
         ^
1 error generated.
Died at /opt/rocm/hip/bin/hipcc line 378.

related to this, I was wondering how to use hipBlas, hipFFT, hipRNG?

Cannot call device class member from kernel function on HCC platform

I am not sure if this issue is a genuine HIP issue or if it originates in HCC. Because I am only using HCC through HIP, I am reporting this here.

$ hipcc --version

HCC clang version 3.5.0 (based on HCC 0.10.16186-d14f969-7461349 LLVM 3.5.0svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix

The following code does not compile:

#include "hip/hip_runtime.h"

struct Foo
{
    int m_bar = 5;

    __device__ int bar(int a)
    {
        return a+m_bar;
    }
};

__global__ void kernel(hipLaunchParm lp, int a)
{
    Foo foo;
    foo.bar(a);
}

int main()
{
    hipLaunchKernel(HIP_KERNEL_NAME(kernel), dim3(1), dim3(1), 0, 0, 5);

    return 0;
}

test_direct.cpp:16:6: error: 'Foo::bar': no overloaded function has restriction specifiers that are compatible with the ambient context 'kernel'
So, it looks like the compiler either thinks, that foo.bar is not a __device__ function, or it mistakes the kernel for a host function.

Strangely, calling a non-member __device__ function, which in turn calls the member works. So the following code compiles:

#include "hip/hip_runtime.h"

struct Foo
{
    int m_bar = 5;

    __device__ int bar(int a)
    {
        return a+m_bar;
    }
};

__device__ int bar_wrapper(Foo& foo, int a)
{
    return foo.bar(a);
}

__global__ void kernel(hipLaunchParm lp, int a)
{
    Foo foo;
    bar_wrapper(foo, a);
}

int main()
{
    hipLaunchKernel(HIP_KERNEL_NAME(kernel), dim3(1), dim3(1), 0, 0, 5);

    return 0;
}

function "hipDeviceGetPCIBusId" has already been defined

Error building HIP/samples/2_Cookbook/0_MatrixTranspose:
# make
results in
/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(778): error: function "hipDeviceGetPCIBusId" has already been defined

Dockerfile for developer image:

FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04

RUN \
	apt-get update && \
	apt-get install -y wget

RUN \
	wget -qO - http://packages.amd.com/rocm/apt/debian/rocm.gpg.key | apt-key add - && \
	sh -c 'echo deb [arch=amd64] http://packages.amd.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list' && \
	apt-get update && \
	DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y hip_nvcc
ENV PATH /opt/rocm/bin/:$PATH

functions that differ only in their return type cannot be overloaded * 5

Hi,

I'm running Crunchbang++ (debian jessie kernel 4.7pf)

I'm trying out hcc built from this script (build_hcc_and_hip.sh - can't remember where I found it now)
It's probable I'm doing something wrong or have forgot to do something, but with hcc/hip/llvm trying to build a simple "hipified" (slightly modified) cuda example, you can see here

I get the following error

Hipify PARBOIL benchmark suite

Port PARBOIL benchmark suite from CUDA to HIP.

http://impact.crhc.illinois.edu/parboil/parboil.aspx
http://impact.crhc.illinois.edu/parboil/parboil_download_page.aspx

Missing API hipEventDisableTiming for nvcc path

hipEventDesableTiming is detined in hcc_detail/hip_runtime_api.h but not in nvcc_detail/hip_runtime_api.h. Need a substitution from hipEventDisableTiming to cudaEventDisableTiming

hipStreamAddCallback compile error

Hi,

I got the following error when trying to compile the sample square code:

/opt/rocm/hip/bin/hipcc square.hipref.cpp -o square.hip.out
/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(588): error: type name is not allowed

/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(588): error: expected a ")"

/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(588): error: too few arguments in function call

/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(589): error: type name is not allowed

/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(589): error: expected a ")"

/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(589): error: too many arguments in function call

/opt/rocm/hip/include/hip/nvcc_detail/hip_runtime_api.h(589): error: expected a ";"

7 errors detected in the compilation of "/tmp/tmpxft_0000396e_00000000-9_square.hipref.cpp1.ii".
Died at /opt/rocm/hip/bin/hipcc line 365.
make: *** [square.hip.out] Error 2

The line 588 and 589 of hip_runtime_api.h is:

inline static hipError_t hipStreamAddCallback(hipStream_t stream, hipStreamCallback_t callback, void *userData, unsigned int flags)
{
return *hipCUDAErrorTohipError(cudaStreamAddCallback(cudaStream_t stream,
cudaStreamCallback_t callback, void userData, unsigned int flags));
}

Shouldn't brackets to be add to each type name here?

Thanks,

Support VISIBLE_DEVICES environment var

use an environment variable to control which devices are made visible to the application.

support comma-separated list of device IDs.
order of device ids in the env var control order in the application, ie 2,1,0 will cause HW device 2 to be listed at device0 in the app.

Implementation:

modify src/hip_hcc.cpp ihipInit to use env variable to control which of the detected devices get added to the "g_devices" list, and also control the order they get added in.

build error of fp16 in developer-preview branch

I am building the "developer-preview branch" of HIP. My hcc compiler is "HCC clang version 3.5.0 (based on HCC 0.10.16501-81f0a2f-02246a0 LLVM 3.5.0svn)" based on rocm-1.4.

I see the compiler error:
"/home/tim/HIP/src/hip_fp16.cpp:37:12: error: invalid operands to binary expression ('__half' and '__half')"
return a + b;
~ ^ ~

Equivalent to mad24 call in opencl ?

hipMemcpyAsync works incorrectly

Seems like the update yesterday brings some incorrect behaviors to hipMemcpyAsync and streams?

system configurations:

HIP: 276ca7c4
HCC:  HCC 0.10.16102-064fdb1-f002ffd 
ROCK: 2126ace1317 (dev)
ROCR: 85ad07b (dev)

The following test code will fail with mismatched results.

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include "hip_runtime.h"

#define HIP_ASSERT(x) (assert((x)==hipSuccess))


#define WIDTH     1024
#define HEIGHT    1024

#define NUM       (WIDTH*HEIGHT)

#define THREADS_PER_BLOCK_X  16
#define THREADS_PER_BLOCK_Y  16
#define THREADS_PER_BLOCK_Z  1

int main() {

  int *hostA;
  int *hostB;

  int *deviceA;
  int *deviceB;

  int i;
  int errors;

  hostA = (int *)malloc(NUM * sizeof(int));
  hostB = (int *)malloc(NUM * sizeof(int));

  // initialize the input data
  for (i = 0; i < NUM; i++) {
    hostB[i] = i;
  }

  HIP_ASSERT(hipMalloc((void**)&deviceA, NUM * sizeof(int)));
  HIP_ASSERT(hipMalloc((void**)&deviceB, NUM * sizeof(int)));

  hipStream_t s;
  hipStreamCreate(&s);


  // hostB -> deviceB -> hostA
#define ASYNC 1
#if ASYNC
  HIP_ASSERT(hipMemcpyAsync(deviceB, hostB, NUM*sizeof(int), hipMemcpyHostToDevice, s));
  HIP_ASSERT(hipMemcpyAsync(hostA, deviceB, NUM*sizeof(int), hipMemcpyDeviceToHost, s));
#else
  HIP_ASSERT(hipMemcpy(deviceB, hostB, NUM*sizeof(int), hipMemcpyHostToDevice));
  HIP_ASSERT(hipMemcpy(hostA, deviceB, NUM*sizeof(int), hipMemcpyDeviceToHost));
#endif

  hipStreamSynchronize(s);
  hipDeviceSynchronize();

  // verify the results
  errors = 0;
  for (i = 0; i < NUM; i++) {
    if (hostA[i] != (hostB[i])) {
      errors++;
    }
  }
  if (errors!=0) {
    printf("FAILED: %d errors\n",errors);
  } else {
    printf ("PASSED!\n");
  }

  hipStreamDestroy(s);

  HIP_ASSERT(hipFree(deviceA));
  HIP_ASSERT(hipFree(deviceB));

  free(hostA);
  free(hostB);

  //hipResetDefaultAccelerator();

  return errors;
}

Missing update documentation

There is build_hcc_and_hip.sh and good installation notes, but for updating there is a lack of documentation. Just read https://github.com/ROCm-Developer-Tools/HIP/releases/tag/preview_1.0.17241 and you see all is oriented towards new users for installation.

We don't want to get to a situation where we should had done something different. And while you're on it, please also explain how to test a new version - switching between multiple versions is really required.

List of supported GPUs

There seems that in the primary release only the

Fury Series GPUs were supported.

Is there a updated list of supported GPUs?

Implement async streams

Requires async copy.

Use logic-defined or in hipcc and hipconfig script for better readability

Hi,

In hipcc and hipconfig script, I see a lot of unless defined idioms used for setting default values of flags, such as the following:

    $HSA_PATH=$ENV{'HSA_PATH'};
    $HSA_PATH="/opt/rocm/hsa" unless defined $HSA_PATH;

This can be replaced by Perl's logic-defined-or operator //, which literally means setting the default value for Perl people.

    $HSA_PATH = $ENV{'HSA_PATH'} // "/opt/rocm/hsa";

Reference:

http://perldoc.perl.org/perlop.html#Logical-Defined-Or
http://perlmaven.com/how-to-set-default-values-in-perl

Add a list of supported hardware to README.md

This would be handy to have.

Unions passed as kernel parameters have incorrect values

Here is my test case:
structs-test.cpp.txt

hipcc structs-test.cpp -o structs-test
./structs-test

Output:

Expected output:

I pass 3 structs: V1 - simple, V2 and V3 - with unions. V1 works as expected, fields of V2 and V3 contain incorrect values (in my case at least, they contain value of other parameter p0).

These structs are based on this type: https://github.com/g-truc/glm/blob/master/glm/detail/type_vec3.hpp#L43
It works correctly on CUDA (I mean, not HIP-nvcc), and IIRC worked on ROCm 1.4 with a few modifications of glm.
Perhaps, my test case will work on 1.4 too, but I can't check right now.

If this is not a bug (in case HIP/HCC do not support unions as kernel params), then perhaps a warning should be shown.

High-impact application of HIP

Google has contributed to a beta version of Eigen (3.3) that uses CUDA to implement their tensor operations in Tensorflow. If I understand the HIP marketing literature, it allows one to port a CUDA codebase to run on AMD GPUs whilst keeping a mostly CUDA-type codebase.

There is an effort underway to port tensorflow to an opencl back-end, but it appears to have mostly stalled out. At least from what I can tell.

Some of us tensorflowers have found that the fp16 support in Maxwell and Pascal (GP102) GPUs (maybe not GP100, but who has $120K lying around?) is crippled to some extent; runs slower than fp32. fp16, even if it's not faster, allows larger models to fit in memory. I think Vega and Polaris support native fp16. It might be a big win for those of us who would want to run fp16 to have a hippified Eigen backend in Tensorflow.

Sorry, I'm not offering to help. But in case this is news to anyone here, I thought I would pass along the thought.

Forcing HIP runtime initialization before starting time counter

When using CUDA, and measuring time for the execution of a kernel, I'm used to call 'cudaFree(0)' before starting to count time.
The call of 'cudaFree(0)' forces the CUDA runtime to initialize before starting the time counter.

E.g.:

// So that the CUDA runtime is initialized before starting measuring time
cudaFree(0);

struct timespec t_start, t_stop;
clock_gettime( CLOCK_MONOTONIC, &t_start);
deviceReduceKernel <<<numBlocks_red, threadsPerBlock_red>>> (distmat_d, sum_perblock_d, totalNumElementsDistmat);
clock_gettime( CLOCK_MONOTONIC, &t_stop);

How could I achuive the same (force runtime intialization) when using HIP?

Using 'hipFree(0)' instead of 'cudaFree(0)' does not result in the same behavior, as the elapsed time as reported by the timer remais the same as when not calling 'hipFree(0)'.

Implement hipGetErrorString

[thanks! updated to hipGetErrorString]
Currently hipGetErrorString returns the name of the error (ie "hipErrorInvalidDevice") but really should return a full message string explaining the error.

Reference:
See src/hip_hcc.cpp for implemention of hipGetErrorString function.
See include/hip_runtime_api.h for a doxygen-format comments that explain each error.- these could form the text of the error-string message set by this function.

kernel oops when running hip kernel with dev branch ROCR/ROCK

I was able to do the tutorial on gpuopen.com but found that hipGetDeviceCount was only returning 1 so the examples would only run on my primary GPU a GTX 980Ti. I also have an R9 Nano and an R9 Fury. The kfd driver exports 3 nodes under topology so the runtime should let me talk to them. I'm running Ubuntu 15. I was hoping to instrument hip_hcc.cpp to see what it was doing right here:

/*
  * Build a table of valid compute devices.
  */
 auto accs = hc::accelerator::get_all();
 int deviceCnt = 0;
 for (int i=0; i<accs.size(); i++) {
     if (! accs[i].get_is_emulated()) {
         deviceCnt++;
     }
 };
 -
 +    printf("actual device count is %d\n", deviceCnt);
 // Make sure the hip visible devices are within the deviceCnt range
 for (int i = 0; i < g_hip_visible_devices.size(); i++) {
     if(g_hip_visible_devices[i] >= deviceCnt){
         // Make sure any DeviceID after invalid DeviceID will be erased.
         g_hip_visible_devices.resize(i);
         break;
     }
 }

But I can't even get it to compile:
~/devel/HIP2$ make
./bin/hipcc -I/opt/hcc/include -std=c++11 -I/opt/hsa/include src/hip_hcc.cpp -c -O3 -o src/hip_hcc.o
src/hip_hcc.cpp:52:2: error: #error (USE_AM_TRACKER requries HCC version of 16074 or newer)
#error (USE_AM_TRACKER requries HCC version of 16074 or newer)
^
Died at ./bin/hipcc line 208.
Makefile:20: recipe for target 'src/hip_hcc.o' failed
make: *** [src/hip_hcc.o] Error 1

I made the following change to the Makefile in response to complaints. But it's still not doing anything. And it looks like it's trying to compile the code with nvcc:
mmacy@pandemonium:~/devel/HIP2$ hipcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

Compilation problem

Hello! I try to compile caffe 1.0.0-rc3 with HIP. There are some errors like:

/home/beamoflight/caffe/caffe/src/caffe/util/im2col.cu(57): error: no instance of function template "caffe::im2col_gpu_kernel" matches the argument list
argument types are: (int, int, const float *, const int, const int, const int, const int, const int, const int, const int, const int, const int, const int, int, int, float *)
detected during instantiation of "void caffe::im2col_gpu(const Dtype *, int, int, int, int, int, int, int, int, int, int, int, Dtype *) [with Dtype=float]"
(65): here
Can you help me?

P.S. See files in attachment
im2col_cu.txt
im2col_cu_diff.txt

Finding good flags for the 'hipcc' compiler

What parameters should I pass to 'hipcc' so that it will most likely result in the fastest possible device code being generated?

For instance, with 'nvcc' (NVIDIA's compiler) I use '--use-fast-math' wich sets '--prec-div=false' and '--prec-sqrt=false', so that the resulting compiled device code is faster as the cost of precision.

Does the 'hipcc' '-ffast-math' accomplish the same thing?

no matching function for call to 'hipHostGetDevicePointer'

Hello,

Is hipHostGetDevicePointer supported in hipcc?

Here is my experiment.
The progress is allocated on the host using pinned memory.

CUDA_ASSERT(hipHostMalloc((void **)&progress, sizeof(int), hipHostMallocMapped))

However, when I try to get the device pointer, the following error shows.

mcx_core.cu:1301:18: error: no matching function for call to 'hipHostGetDevicePointer'
     CUDA_ASSERT(hipHostGetDevicePointer((int **)&gprogress, (int *)progress, 0));

What are the requirements for using `shfl` operations on AMD GPU?

There are requirements to use shfl operations on nVidia GPU: https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/tree/master/samples/2_Cookbook/4_shfl#requirement-for-nvidia

requirement for nvidia

please make sure you have a 3.0 or higher compute capable device in order to use warp shfl operations and add -gencode arch=compute=30, code=sm_30 nvcc flag in the Makefile while using this application.

Also noted that HIP supports shfl for 64 wavesize (WARP-size) on AMD: https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/master/docs/markdown/hip_faq.md#why-use-hip-rather-than-supporting-cuda-directly

In addition, HIP defines portable mechanisms to query architectural features, and supports a larger 64-bit wavesize which expands the return type for cross-lane functions like ballot and shuffle from 32-bit ints to 64-bit ints.

But which of AMD GPUs does support functions shfl, or does any AMD GPU support shfl because on AMD GPU it implemented by using Local-memory without hardware instruction register-to-register?

Add performance comparison with OpenCL

I've been looking for a performance comparison of using using HCC (through HIP) vs OpenCL. But I haven't been able to find any 😟

I see there is an issue for porting the PARBOIL benchmark suite, but that is not completed yet.

The Rodinia benchmark suite seems to be ported (as mentioned in the README), but I can't find any performance graphs. The same goes for mixbench and GPU-Stream mentioned in the HIP-Examples repo.

It would be great to have performance graphs readily available. It could also be fun to have a server run such benchmarks every week or so, as rust does: http://perf.rust-lang.org/

keep up the good work 👍

Compilation error for oceanFFT + HIP

Hello! I have compilation error after code hiplification for the sample oceanFFT ( /usr/local/cuda/samples/5_Simulations/oceanFFT ):

../../common/inc/helper_cuda.h:990:81: error: no matching function for call to ‘_cudaGetErrorEnum(hipError_t&)’
file, line, static_cast(result), _cudaGetErrorEnum(result), func);

What am i doing wrong or how fix it manually?

Implement stream-level control over blocking/active sync

Support this flag: hipDeviceScheduleBlockingSync

Add flags definitions to HCC and NVCC paths.

On HCC side:
save flag with the ihipDevice_t structure (in _flags) defined in include/hcc_detail.h

Inside streamSynchronize, use hc::hcWaitModeBlocked : hc::hcWaitModeActive based on state of the parent device associated with this stream.

Kernel with no parameters

A kernel with no parameters (from cudahandbook/concurrency/nullKernelSync.cu, for example) generates the appended compile error.

Adding a dummy int parameter fixes the problem.

/opt/rocm/hip/bin/hipcc -stdlib=libc++ -Wnull-character -I ../chLib -D NO_CUDA nullKernelSync.cpp -o nullKernelSync
nullKernelSync.cpp:67:9: error: expected expression
hipLaunchKernel(HIP_KERNEL_NAME(NullKernel), dim3(1), dim3(1), 0, 0, );
^
/opt/rocm/hip/include/hip/hcc_detail/hip_runtime.h:541:33: note: expanded from
macro 'hipLaunchKernel'
_kernelName (lp, ##VA_ARGS);
^
1 error generated.
Died at /opt/rocm/hip/bin/hipcc line 269.

Hip error

I just cloned this repo. The previous versions were working but now I get an error that it requires a different version of HCC. However the HCC link given in the github page for HIP has deb/tar packages that were last updated in January. Could anyone please point me to a recent package of HCC because the ones there are outdated and I cannot find a newer version. As a result my HIP is not running. Please help.

Mismatched API hipMemsetAsync between hcc and nvcc

The nvcc version doesn't take the last argument stream.

// nvcc/detail/hip_runtime_api.h
inline static hipError_t hipMemsetAsync(void* devPtr,int value, size_t count)

// hcc_detail/hip_runtime_api.h
#if __cplusplus
hipError_t hipMemsetAsync(void* dst, int  value, size_t sizeBytes, hipStream_t = 0 );
#else
hipError_t hipMemsetAsync(void* dst, int value, size_t sizeBytes, hipStream_t stream);
#endif

https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/master/include/hcc_detail/hip_runtime_api.h#L903

https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/master/include/nvcc_detail/hip_runtime_api.h#L222

Converting CUDA sample that uses OpenCV to HIP

I'm trying to convert Udacity parallel programming class problem sets to HIP. It uses OpenCV and it looks like it is not compatible with hcc. I reported the problem to hcc project and was told that HIP team may have a solution. Is there a way to use hcc/HIP to compile/link OpenCV proejcts?
See ROCm/hcc#47 for more information.

The issue seems to be that OpenCV requires libstdc++ and hcc requires libc++.

Thanks

HIPCC not defined in HIP

The __HIPCC__ predefined macro seems not to be defined during compilation. Here is a test case with the following source:

#include <stdio.h>

int main(int argc, char* argv[]) {
#ifdef __HIPCC__
    printf("__HIPCC__ is defined\n");
#endif
#ifdef __CUDACC__
    printf("__CUDACC__ is defined\n");
#endif
#ifdef __HCC__
    printf("__HCC__ is defined\n");
#endif
    return 0;
}

After compiling with hipcc (/opt/rocm/bin/hipcc -o test-macro -O2 -I/opt/rocm/hip/include test-macro.cpp) and running, it returns just the following output:

__HCC__ is defined

sqrtf and sqrt in device code cannot be compiled

To reproduce replace line 46 (https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/master/samples/0_Intro/square/square.hipref.cpp#L46)
with

        C_d[i] = sqrtf(A_d[i]);

hipcc square.hipref.cpp -o square.hip.out fails with

Referencing function in another module!
  %34 = call float @sqrtf(float %33) #5
LLVM ERROR: Broken function found, compilation aborted!
Generating HSAIL BRIG kernel failed

Full log: hipcc-sqrt-log.txt

I don't know about other functions but the same error for sqrt(double).
But, for example, this can be compiled:

        C_d[i] = __fsqrt_rn(A_d[i]);

Checked on:

HIP version: 1.0.16503
HCC clang version 3.5.0  (based on HCC 0.10.16501-81f0a2f-02246a0 LLVM 3.5.0svn)

Incorrect behavior of hipCreateChannelDesc and hipMallocArray

Current HIP-HCC path doesn't support channelDesc with compound types like float2 or float4 [1]. Consequently, the memory allocation and initialization of a hipArray bound on a float4 texture failed.

Consider the following code:

#include <hip_runtime.h>                                                        
#include <stdio.h>                                                              

template<typename T>                                                            
void f(void) {                                                                  
  hipChannelFormatDesc desc = hipCreateChannelDesc<T>();                        
  printf("%d ", desc.x);                                                        
  printf("%d ", desc.y);                                                        
  printf("%d ", desc.z);                                                        
  printf("%d ", desc.w);                                                        
  printf("%d \n", desc.f);                                                      
}                                                                               

int main () {                                                                   
  f<float>();                                                                   
  f<float2>();                                                                  
  f<float4>();                                                                  
  return 0;                                                                     
}

On cuda platform, it prints

32 0 0 0 2 
32 32 0 0 2 
32 32 32 32 2

On HSA platform, it prints

32 0 0 0 2 
0 0 0 0 3  
0 0 0 0 3

In hipMallocArray, it only consider the hipChannelFormatKind of the array but not the dimensions of the channeldesc, this might be incorrect [2].

[1] https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/developer-preview/include/hcc_detail/hip_texture.h#L226
[2] https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP/blob/developer-preview/src/hip_memory.cpp#L227

missing hipArray APIs in nvcc path

hipArray APIs, such as hipArray, hipMallocArray, ... etc are missing in nvcc path. Need to map them in include/nvcc_details/hip_runtime_api.h.

Java bindings

Hi, I am cross-posting this request in the JCUDA repo as well. (jcuda/jcuda#5)

JCUDA is a set of Java bindings for the CUDA API (both Driver API and the parts of Runtime API) and associated libraries (cuBLAS, cuDNN, etc.). It's simple, straight-forward, and works very well with no performance impact.

I'd love to jump into HIP by converting my existing JCUDA code. Is there any chance you could collaborate with the JCUDA project or create an analogous library?

hipify tensorflow?

Hi.

I'm curious if anyone has tried doing this, or is working on doing so...

If not, how, generally, could I accomplish this?

error: expected top-level entity

Hello,
I am trying to compile monte carlo photon migration in cuda (mcx) on rocm.

I have cleared all the warnings, but am confronted by the following error. 

Any suggestions?

leiming@lowfreq:~/mcx_no_symbol/src$ /opt/rocm/hip/bin/hipcc -I/opt/rocm/hip/include  -std=c++11  -DUSE_ATOMIC  -DSAVE_DETECTORS -DUSE_CACHEBOX  -DMCX_TARGET_NAME='"Fermi MCX"'  mcx_core.cpp  -o mcx_core.o 
/opt/rocm/hcc-lc/compiler/bin/llvm-as: /tmp/tmp.U4yAIWtXHr/mcx_core-c8f8d4.o.host_redirect.ll:1:1: error: expected top-level entity
/opt/rocm/hcc-lc/compiler/bin/opt: <stdin>:5768:9: error: stored value and pointer type do not match
^
clang-3.5: error: no such file or directory: '/tmp/tmp.U4yAIWtXHr/mcx_core-c8f8d4.o.host_redirect.bc'
clang-3.5: error: no input files
objdump: 'mcx_core.o': No such file
objdump: '/tmp/mcx_core-c8f8d4.o': No such file
ld: cannot find /tmp/mcx_core-c8f8d4.o: No such file or directory
clang-3.5: error: linker command failed with exit code 1 (use -v to see invocation)
Died at /opt/rocm/hip/bin/hipcc line 378.

cuda math intrinsic functions support

Hello,

I found that the math intrinsics on cuda, such as __float2int_rn, are not supported by hipcc. Can you confirm this? If not, how to make it work?

Best,
Leiming

p.s. Here is a snippet of the error message I got by running the following command with rocm.

$ /opt/rocm/hip/bin/hipcc  -I/opt/rocm/hip/include  -std=c++11 -fopenmp -DUSE_ATOMIC  -DSAVE_DETECTORS -DUSE_CACHEBOX  -DMCX_TARGET_NAME='"Fermi MCX"' -o mcx_core.o  mcx_core.cu
mcx_core.cu:182:30: error: use of undeclared identifier '__float2int_rn'
      xi[0] = mcx_nextafterf(__float2int_rn(htime[0]), (v->x > 0.f)-(v->x < 0.f));

hipFree returns error code when 0/null-pointer is passed.

I don't know how common the practice is. But Udacity homework code calls cudaFree(0) before doing anything else and hipified version returns error code and cause the program to exit with an error. If this practice is common among CUDA software, hipFree should return success code and make it behave more like cudaFree. In C, calling free with 0/null-pointer does not cause error either.

See line 27 of https://github.com/udacity/cs344/blob/master/Problem%20Sets/Problem%20Set%201/HW1.cpp

BLAS and other Maths library support?

HIP is a really nice effort to have a transparent C++ code for both Nvidia and AMD graphic cards. One important feature, however, for GPGPU is BLAS routines. I saw that there is an hcBLAS implementation. I was wondering if you guys intend to support a hipBLAS, which to dispatch the BLAS call to the appropriate hardware calling the corresponding library, or that is not the case?

Port to windows

HIP with NVIDIA

When compiling on a machine with an AMD card I do the following and it works:
gcc -c main.c misc.c -DTYPE=float
hipcc -O2 Kernel.hip.cpp main.o misc.o -DTYPE=float -o kernel_hip_float

But when compiling a machine with an NVIDIA card (and with theCUDA toolkit installed) I get lots of 'warning: null character(s) ignored' and 'error: unrecognized token' errors (reported for the 'main.o' object file) when executing the second command.

What is the correct way of compiling and linking multiple files, where only a single file includes the HIP runtime library (in this example 'Kernel.hip.cpp')?

Is run-time compilation supported/planned?

Basically I'm wondering if HIP has an alternative to NVRTC for compiling at runtime? Although NVRTC is somewhat hacky, it is an immensely useful tool at times, and an alternative for HIP would be great for targeting AMD / portable code.

hipEventDestroy(NULL) crashes

The CUDA runtime returns invalid-resource-handle. (It does not crash.)

(In retrospect, as Tommy Thorn pointed out back in the day, we screwed up the interfaces for the CUDA runtime. free() returns void and free(NULL) is valid and has no side effects. Really we should have echoed those API semantics in the CUDA runtime.)

hipLaunchKernel alone does not work

The following simple test fails with no device found.
a.out: /srv/git/HIP/src/hip_hcc.cpp:1019: ihipDevice_t *ihipGetTlsDefaultDevice(): Assertion `ihipIsValidDevice(tls_defaultDevice)' failed.
Aborted (core dumped)

Adding some other code, for example:
void* mem;
hipMalloc(&mem, 1024);
to the beginnig makes it work. The problem seems to be that hipLaunchKernel does not initialize HIP runtime.

include "hip_runtime.h"

global void empty_hip_kernel(hipLaunchParm lp, int param)
{
}

int main(int argc, const char** argv)
{
hipLaunchKernel(HIP_KERNEL_NAME(empty_hip_kernel), dim3(1), dim3(1), 0, 0, 0);
hipDeviceSynchronize();
return 0;
}

can not build hip from source

it is rocm-1.5 master branch

/home/tim/HIP/src/device_util.cpp:1219:10: error: call to 'sqrt' is ambiguous
return hc::fast_math::sqrt(x);
^~~~~~~~~~~~~~~~~~~
/opt/rocm/include/kalmar_math.h:654:16: note: candidate function
__fp16 sqrt(__fp16 x) { return __hc_sqrt_native_half(x); }
^
/opt/rocm/include/kalmar_math.h:657:15: note: candidate function
float sqrt(float x) { return fast_math::sqrtf(x); }
^
/usr/include/x86_64-linux-gnu/bits/mathcalls.h:156:13: note: candidate function
__MATHCALL (sqrt,, (Mdouble __x));
^
/usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/cmath:482:3: note: candidate function
sqrt(float __x)
^
/usr/lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/cmath:486:3: note: candidate function
sqrt(long double __x)

Transition HCC path to use async mem copy APIs

hsa_memory_copy_async and friends which allow specifying input signal dependency. This is also the stepping stone towards HCC support for async streams.