juliagpu / gpuarrays.jl Goto Github PK
View Code? Open in Web Editor NEWReusable array functionality for Julia's various GPU backends.
License: MIT License
Reusable array functionality for Julia's various GPU backends.
License: MIT License
I tried running the example but get:
julia> c = a * b # calls to BLAS
ERROR: MethodError: no method matching blas_module(::GPUArrays.CLBackend.CLContext)
Closest candidates are:
blas_module(::GPUArrays.JLBackend.JLContext) at /Users/solver/.julia/v0.6/GPUArrays/src/backends/julia/julia.jl:81
blas_module(::Union{GPUArrays.AbstractAccArray{T,1}, GPUArrays.AbstractAccArray{T,2}} where T) at /Users/solver/.julia/v0.6/GPUArrays/src/backends/blas.jl:4
Stacktrace:
[1] gemm! at /Users/solver/.julia/v0.6/GPUArrays/src/backends/blas.jl:20 [inlined]
[2] gemm_wrapper!(::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::Char, ::Char, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}) at ./linalg/matmul.jl:369
[3] *(::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}) at ./linalg/matmul.jl:148
julia> versioninfo()
Julia Version 0.6.0-pre.beta.9
Commit ceb06740e3 (2017-04-02 01:04 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin16.4.0)
CPU: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
WORD_SIZE: 64
BLAS: libgfortblas
LAPACK: liblapack
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
so i m runing on 0.6 julia, and exemple with ReverseDiff pkg. complains on convertion to pointer in GPUArray, is there a whay to fix it?
LoadError: conversion to pointer not defined for GPUArrays.GPUArray{Float32,1,Array{Float32,1},GPUArrays.JLBackend.JLContext}
while loading C:\Users\PunkR.julia\v0.6\GPUArrays\examples\logreg.jl, in expression starting on line 23
in gradient at ReverseDiff\src\api\gradients.jl:22
in gradient at ReverseDiff\src\api\gradients.jl:24
in seeded_reverse_pass! at ReverseDiff\src\api\tape.jl:47
in seeded_reverse_pass! at ReverseDiff\src\api\utils.jl:31
in reverse_pass! at ReverseDiff\src\tape.jl:87
in reverse_exec! at ReverseDiff\src\tape.jl:93
in special_reverse_exec! at ReverseDiff\src\derivatives/linalg\arithmetic.jl:224
in A_mul_Bc! at base\linalg\matmul.jl:223
in A_mul_Bt! at base\linalg\matmul.jl:193
in gemm_wrapper! at base\linalg\matmul.jl:369
in gemm! at base\linalg\blas.jl:1027
And 2nd question, is there some documentation about pakage? at least some kind of list of available functions?
And 3rd, in some benches there kind of NT = Base.Threads.nthreads() and JLBackend.init(), and i cant figure out how to set more threads, it's olweys find 1 availible/
JLContext Intel(R) Xeon(R) CPU E5-1660 0 @ 3.30GHz with 1
threads
This works:
using CUDAdrv
using CUBLAS
A = CuArray(rand(Float32, 100, 100))
B = CuArray(rand(Float32, 100, 100))
C = CuArray(rand(Float32, 100, 100))
A_mul_B!(C, A, B)
but this:
using GPUArrays
A = GPUArray(rand(Float32, 100, 100))
B = GPUArray(rand(Float32, 100, 100))
C = GPUArray(rand(Float32, 100, 100))
A_mul_B!(C, A, B)
gives me:
WARNING: CUBLAS error triggered from:
Stacktrace:
[1] statuscheck(::UInt32) at /home/dfdx/.julia/v0.6/CUBLAS/src/CUBLAS.jl:71
[2] gemm!(::Char, ::Char, ::Float32, ::CUDAdrv.CuArray{Float32,2}, ::CUDAdrv.CuArray{Float32,2}, ::Float32, ::CUDAdrv.CuArray{Float32,2}) at /home/dfdx/.julia/v0.6/CUBLAS/src/blas.jl:915
[3] gemm! at /home/dfdx/.julia/v0.6/GPUArrays/src/backends/blas.jl:21 [inlined]
[4] gemm_wrapper!(::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Char, ::Char, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}) at ./linalg/matmul.jl:367
[5] A_mul_B!(::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}) at ./linalg/matmul.jl:148
[6] eval(::Module, ::Any) at ./boot.jl:235
[7] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:66
[8] macro expansion at ./REPL.jl:97 [inlined]
[9] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
ERROR: "cublas internal error"
Stacktrace:
[1] statuscheck(::UInt32) at /home/dfdx/.julia/v0.6/CUBLAS/src/CUBLAS.jl:73
[2] gemm!(::Char, ::Char, ::Float32, ::CUDAdrv.CuArray{Float32,2}, ::CUDAdrv.CuArray{Float32,2}, ::Float32, ::CUDAdrv.CuArray{Float32,2}) at /home/dfdx/.julia/v0.6/CUBLAS/src/blas.jl:915
[3] gemm! at /home/dfdx/.julia/v0.6/GPUArrays/src/backends/blas.jl:21 [inlined]
[4] gemm_wrapper!(::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Char, ::Char, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}) at ./linalg/matmul.jl:367
[5] A_mul_B!(::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}) at ./linalg/matmul.jl:148
GeForce GTX 960M
Ubuntu 16.04
Do you think it would be worthwhile to add ArrayFire.jl
as a backend? How much work could it be? Would it be possible to interop with Arrays from different backends but backed by the same device?
I'm interested to implement support for cuBLAS, but I haven't followed discussions lately so am a bit out of context and have a couple of questions.
From the README:
CUDAnative could support these easily as well, but we currently run into problems with the interactions of CUDAdrv and CUDArt
I see the relevant commented coded, but what exactly is wrong with it?
Do we want cuBLAS to be integrated into CUDAnative.jl or just GPUArrays.jl?
It would be nice to be able to have fast GPU generation of random numbers. Those algorithms are super good, so wrapping https://github.com/JuliaGPU/CURAND.jl would be very useful. Some way to use that to generate GPUArrays would be very helpful.
Vector reduction (sum(A,1), maximum(A,(2,3)) etc.) and concatenation operations are not supported currently and I think its error message is not informative as much as possible. It just points out indexing is not supported; but this has nothing to with indexing.
Thanks.
While testing the example on the README, I got the following error. I have CUDAnative already working and tested, so what's going on?
julia> a = GPUArray(rand(Float32, 32, 32))
ERROR: UndefVarError: CUDAnative not defined
Stacktrace:
[1] current_device at /home/mohd/.julia/v0.6/GPUArrays/src/backends/opencl/opencl.jl:59 [inlined]
[2] current_context() at /home/mohd/.julia/v0.6/GPUArrays/src/backends/opencl/opencl.jl:64
[3] #call#5(::Array{Any,1}, ::Type{GPUArrays.GPUArray}, ::Array{Float32,2}) at /home/mohd/.julia/v0.6/GPUArrays/src/abstractarray.jl:85
[4] GPUArrays.GPUArray(::Array{Float32,2}) at /home/mohd/.julia/v0.6/GPUArrays/src/abstractarray.jl:85
It seems that active_backends
and current_backend
is no longer supported. Is this just a leftover from the GPUArrays evolution or is that part of the infrastructure currently missing.
Is there a way to choose the device where we will allocate the arrays? (and perform comutations)
I can see what seems to be the current device
julia> CLBackend.current_context()
CLContext: AMD Radeon HD - FirePro D300 Compute Engine
I see though that there are several possible devices (in this case too many there are only 2 GPUs in my machine not 3).
julia> CLBackend.all_contexts()
3-element Array{GPUArrays.CLBackend.CLContext,1}:
CLContext: AMD Radeon HD - FirePro D300 Compute Engine
CLContext: AMD Radeon HD - FirePro D300 Compute Engine
CLContext: AMD Radeon HD - FirePro D300 Compute Engine
In ArrayFire for example, ArrayFire.AFInfo()
gives you device infor and it can be used to select a device.
julia> ArrayFire.AFInfo()
ArrayFire v3.3.2 (OpenCL, 64-bit Mac OSX, build f65dd97)
[0] APPLE : AMD Radeon HD - FirePro D300 Compute Engine, 2048 MB
-1- APPLE : AMD Radeon HD - FirePro D300 Compute Engine, 2048 MB
What are the most differences where GPUArrays might differ from ArrayFire?
Hey, dumb question: how does a user initialise a GPU Array?
julia> using GPUArrays
julia> a = rand(10)
10-element Array{Float64,1}:
0.480675
0.185595
0.668687
0.315772
0.722896
0.704025
0.443428
0.626663
0.00459752
0.414371
julia> GPUArrays.GPUArray(a)
ERROR: BoundsError: attempt to access 0-element Array{GPUArrays.Context,1} at index [0]
in GPUArrays.GPUArray{T,N,B,C}(::Array{Float64,1}) at /home/ranjan/.julia/v0.6/GPUArrays/src/arrays.jl:60
uprev = GPUArray(rand(Float32, 32, 32))
k1 = GPUArray(rand(Float32, 32, 32))
k2 = GPUArray(rand(Float32, 32, 32))
k3 = GPUArray(rand(Float32, 32, 32))
k4 = GPUArray(rand(Float32, 32, 32))
dt = 1.2f0
b1 = 1.3f0
b2 = 1.4f0
b3 = 1.5f0
b4 = 1.6f0
utilde = similar(uprev)
@. utilde = uprev + dt*(b1*k1 + b2*k2 + b3*k3 + b4*k4)
OpenCL backend
Hey there! I wanted to test out GPUArrays given that it's had its first 0.0.1 release (congrats!), but ran into some trouble. I tried pulling down ddfc16d
on OS X on Julia 0.6, and received build errors for CLFFT, CLBLAS (OS X not officially supported
), and CUDArt (@windows not defined
). GPUArrays still installed, however attempting to load it with using GPUArrays
fails with could not open file .../.julia/v0.6/GPUArrays/src/backends/supported_backends.jl
. Here's my versioninfo:
julia> versioninfo()
Julia Version 0.6.0-pre.alpha.0
Commit 29bc2ac* (2017-02-28 13:15 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
I'm wondering if there's any easy way around these errors, or if I'll need to manually build CLFFT and CLBLAS (as their errors point me to do on OS X), and somehow manually patch or remove the CUDArt dependency?
Regardless of these issues, you've done some amazing work so far!
I'm trying to get GPUArrays.jl working on my new MacBook Pro (with a Radeon card) but I got an error trying to use Transpiler:
julia> Pkg.add("Transpiler")
INFO: Cloning cache of Matcha from https://github.com/SimonDanisch/Matcha.jl.git
INFO: Cloning cache of OpenCL from https://github.com/JuliaGPU/OpenCL.jl.git
INFO: Cloning cache of Sugar from https://github.com/SimonDanisch/Sugar.jl.git
INFO: Cloning cache of Transpiler from https://github.com/SimonDanisch/Transpiler.jl.git
INFO: Installing Matcha v0.0.2
INFO: Installing OpenCL v0.5.2
INFO: Installing Sugar v0.3.0
INFO: Installing Transpiler v0.1.0
INFO: Package database updated
julia> Pkg.add("GPUArrays")
INFO: Cloning cache of CLBLAS from https://github.com/JuliaGPU/CLBLAS.jl.git
INFO: Cloning cache of CLFFT from https://github.com/JuliaGPU/CLFFT.jl.git
INFO: Cloning cache of CUBLAS from https://github.com/JuliaGPU/CUBLAS.jl.git
INFO: Cloning cache of CUDAdrv from https://github.com/JuliaGPU/CUDAdrv.jl.git
INFO: Cloning cache of CUDArt from https://github.com/JuliaGPU/CUDArt.jl.git
INFO: Cloning cache of CUFFT from https://github.com/JuliaGPU/CUFFT.jl.git
INFO: Cloning cache of GPUArrays from https://github.com/JuliaGPU/GPUArrays.jl.git
INFO: Cloning cache of Primes from https://github.com/JuliaMath/Primes.jl.git
INFO: Installing CLBLAS v1.1.1
INFO: Installing CLFFT v0.4.0
INFO: Installing CUBLAS v0.1.0
INFO: Installing CUDAdrv v0.4.2
INFO: Installing CUDArt v0.3.1
INFO: Installing CUFFT v0.0.4
INFO: Installing GPUArrays v0.0.2
INFO: Installing Primes v0.1.3
INFO: Building CLBLAS
===============================[ ERROR: CLBLAS ]================================
LoadError: OSX not oficially supported.
Find manual build instructions on: https://github.com/clMathLibraries/clBLAS/wiki/Build
while loading /Users/solver/.julia/v0.6/CLBLAS/deps/build.jl, in expression starting on line 38
================================================================================
INFO: Building CLFFT
================================[ ERROR: CLFFT ]================================
LoadError: OSX not oficially supported.
Find manual build instructions on: https://github.com/clMathLibraries/clBLAS/wiki/Build
while loading /Users/solver/.julia/v0.6/CLFFT/deps/build.jl, in expression starting on line 34
================================================================================
INFO: Building CUDAdrv
===============================[ ERROR: CUDAdrv ]===============================
LoadError: Could not find the CUDA driver library (specify the path to libcuda using the CUDA_DRIVER environment variable).
while loading /Users/solver/.julia/v0.6/CUDAdrv/deps/build.jl, in expression starting on line 119
================================================================================
INFO: Building CUDArt
===============================[ ERROR: CUDArt ]================================
LoadError: CUDA runtime library cannot be found.
while loading /Users/solver/.julia/v0.6/CUDArt/deps/build.jl, in expression starting on line 380
================================================================================
INFO: Building GPUArrays
INFO: This process will figure out which acceleration Packages you have installed
and therefore which backends GPUArrays can offer.
Theoretically available:
:cudanative, :julia, :opencl
:julia is the default backend, which should always work.
Just start Julia with:
`JULIA_NUM_THREADS=8 julia -O3` to get it some threads.
8 is just an example and should be chosen depending on the processor you have.
`-O3` is completely optional, but when you're already fishing for multhithreaded
acceleration, you might as well want optimization level 3!
In the future, OpenCL, CUDA and OpenGL will be added as another backend.
INFO: Not installing CUDAnative backend. If you've installed CUDAnative.jl not in the
default location, consider building GPUArrays like this:
ENV[CUDANATIVE_PATH] = "path/to/CUDAnative/"
Pkg.build("GPUArrays")
If not installed, you can get CUDAnative like this:
Install CUDA runtime
Build Julia from the branch: tb/cuda.
Then:
Pkg.clone("https://github.com/JuliaGPU/CUDAnative.jl.git") #
Pkg.test("CUDAnative")
Pkg.checkout("CUDAdrv")
Pkg.checkout("LLVM")
INFO: julia added as a backend.
INFO: CUDAnative doesn't seem to be usable and it won't be installed as a backend. Error: ArgumentError("Module CUDAnative not found in current path.\nRun `Pkg.add(\"CUDAnative\")` to install the CUDAnative package.")
INFO: If error fixed, try Pkg.build("GPUArrays") again!
INFO: OpenCL added as backend!
INFO: import of CLBLAS did not work, not added
WARNING: deprecated syntax "typealias clfftNumber Union{Float64,Float32,Complex128,Complex64}" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:55.
Use "const clfftNumber = Union{Float64,Float32,Complex128,Complex64}" instead.
WARNING: deprecated syntax "typealias clfftReal Union{Float64,Float32}" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:56.
Use "const clfftReal = Union{Float64,Float32}" instead.
WARNING: deprecated syntax "typealias clfftComplex Union{Complex128,Complex64}" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:57.
Use "const clfftComplex = Union{Complex128,Complex64}" instead.
WARNING: deprecated syntax "typealias clfftDouble Union{Float64,Complex128}" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:58.
Use "const clfftDouble = Union{Float64,Complex128}" instead.
WARNING: deprecated syntax "typealias clfftSingle Union{Float32,Complex64}" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:59.
Use "const clfftSingle = Union{Float32,Complex64}" instead.
WARNING: deprecated syntax "typealias clfftTypeDouble Union{Type{Float64},Type{Complex128}}" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:60.
Use "const clfftTypeDouble = Union{Type{Float64},Type{Complex128}}" instead.
WARNING: deprecated syntax "typealias clfftTypeSingle Union{Type{Float32},Type{Complex64}}" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:61.
Use "const clfftTypeSingle = Union{Type{Float32},Type{Complex64}}" instead.
WARNING: deprecated syntax "typealias PlanHandle Csize_t" at /Users/solver/.julia/v0.6/CLFFT/src/CLFFT.jl:63.
Use "const PlanHandle = Csize_t" instead.
WARNING: deprecated syntax "typealias PlanHandle Csize_t" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:33.
Use "const PlanHandle = Csize_t" instead.
WARNING: deprecated syntax "typealias Callback Ptr{Void}" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:34.
Use "const Callback = Ptr{Void}" instead.
WARNING: deprecated syntax "typealias UserData Ptr{Void}" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:35.
Use "const UserData = Ptr{Void}" instead.
WARNING: deprecated syntax "typealias Precision Cint" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:37.
Use "const Precision = Cint" instead.
WARNING: deprecated syntax "typealias Dim Cint" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:38.
Use "const Dim = Cint" instead.
WARNING: deprecated syntax "typealias Direction Cint" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:39.
Use "const Direction = Cint" instead.
WARNING: deprecated syntax "typealias Layout Cint" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:40.
Use "const Layout = Cint" instead.
WARNING: deprecated syntax "typealias ResultLocation Cint" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:41.
Use "const ResultLocation = Cint" instead.
WARNING: deprecated syntax "typealias ResultTransposed Cint" at /Users/solver/.julia/v0.6/CLFFT/src/api.jl:42.
Use "const ResultTransposed = Cint" instead.
INFO: import of CLFFT did not work, not added
================================[ BUILD ERRORS ]================================
WARNING: CUDArt, CUDAdrv, CLBLAS and CLFFT had build errors.
- packages with build errors remain installed in /Users/solver/.julia/v0.6
- build the package(s) and all dependencies with `Pkg.build("CUDArt", "CUDAdrv", "CLBLAS", "CLFFT")`
- build a single package by running its `deps/build.jl` script
================================================================================
INFO: Package database updated
julia> using GPUArrays
INFO: Precompiling module GPUArrays.
ERROR: LoadError: LoadError: LoadError: LoadError: UndefVarError: CLTranspiler not defined
Stacktrace:
[1] include_from_node1(::String) at ./loading.jl:569
[2] include(::String) at ./sysimg.jl:14
[3] include_from_node1(::String) at ./loading.jl:569
[4] include(::String) at ./sysimg.jl:14
[5] include_from_node1(::String) at ./loading.jl:569
[6] include(::String) at ./sysimg.jl:14
[7] include_from_node1(::String) at ./loading.jl:569
[8] include(::String) at ./sysimg.jl:14
[9] anonymous at ./<missing>:2
while loading /Users/solver/.julia/v0.6/GPUArrays/src/backends/opencl/opencl.jl, in expression starting on line 17
while loading /Users/solver/.julia/v0.6/GPUArrays/src/backends/supported_backends.jl, in expression starting on line 9
while loading /Users/solver/.julia/v0.6/GPUArrays/src/backends/backends.jl, in expression starting on line 52
while loading /Users/solver/.julia/v0.6/GPUArrays/src/GPUArrays.jl, in expression starting on line 10
ERROR: Failed to precompile GPUArrays to /Users/solver/.julia/lib/v0.6/GPUArrays.ji.
Stacktrace:
[1] compilecache(::String) at ./loading.jl:703
[2] _require(::Symbol) at ./loading.jl:490
[3] require(::Symbol) at ./loading.jl:398
On intelHD gpu with beignet drivers.
julia> v=GPUArray(rand(2,2))
GPUArray with ctx: CLContext: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2:
2×2 Array{Float64,2}:
0.841527 0.436476
0.23317 0.0211867
julia> v[1,1]
Couldn't compile kernel:
1 : // dependencies
2 : // (GPUArrays.linear_index, Tuple{Transpiler.CLIntrinsics.CLArray{Float64,2},Float32})
3 : uint linear_index_1(__global float * restrict x2unused2, float state)
4 : {
5 : return (get_global_id)(0) + (uint){1};
6 : }
7 : // Type{UInt32}
8 : typedef int Type3UInt324; // placeholder type instance
9 : __constant Type3UInt324 TYP_INST_Type3UInt324;
10 :
11 : // (GPUArrays._ind2sub, Tuple{Tuple{UInt32},UInt32})
12 : uint _ind2sub_5(uint indslast, uint ind)
13 : {
14 : return (uint){ind + (uint){1}};
15 : }
16 : // (Base.argtail, Tuple{UInt32,UInt32})
17 : uint argtail_6(uint x, uint rest)
18 : {
19 : return rest;
20 : }
21 : // (Base.tail, Tuple{Tuple{UInt32,UInt32}})
22 : uint tail_7(uint2 x)
23 : {
24 : uint2 x22_apply_tmp2659;
25 : x22_apply_tmp2659 = x;
26 : return (argtail_6)(x22_apply_tmp2659.s0, (uint){x22_apply_tmp2659.s1});
27 : }
28 : // (GPUArrays._ind2sub, Tuple{Tuple{UInt32,UInt32},UInt32})
29 : uint2 _ind2sub_2(uint2 inds, uint ind)
30 : {
31 : uint l;
32 : uint f;
33 : uint indnext;
34 : uint r1;
35 : r1 = inds.s0;
36 : indnext = ind / r1;
37 : f = (uint){1};
38 : l = r1;
39 : uint x22_apply_tmp2658;
40 : x22_apply_tmp2658 = (_ind2sub_5)((tail_7)(inds), indnext);
41 : return (uint2){(ind - l * indnext) + f, x22_apply_tmp2658};
42 : }
43 : // (GPUArrays.gpu_ind2sub, Tuple{Tuple{UInt32,UInt32},UInt32})
44 : uint2 gpu_ind2sub_2(uint2 dims, uint ind)
45 : {
46 : return (_ind2sub_2)((uint2){dims}, ind - (uint){1});
47 : }
48 : // Tuple{}
49 : typedef int Tuple_; // empty type emitted as an int
50 : // (Base.argtail, Tuple{UInt32})
51 : Tuple_ argtail_8(uint x, Tuple_ rest)
52 : {
53 : return rest;
54 : }
55 : // (Base.tail, Tuple{Tuple{UInt32}})
56 : Tuple_ tail_9(uint x)
57 : {
58 : uint x22_apply_tmp2663;
59 : x22_apply_tmp2663 = x;
60 : return (argtail_8)(x22_apply_tmp2663, (Tuple_){0.0f});
61 : }
62 : // (GPUArrays._sub2ind, Tuple{Tuple{},UInt32,UInt32})
63 : uint _sub2ind_10(Tuple_ x, uint L, uint ind)
64 : {
65 : return ind;
66 : }
67 : // (GPUArrays._sub2ind, Tuple{Tuple{UInt32},UInt32,UInt32,UInt32})
68 : uint _sub2ind_11(uint inds, uint L, uint ind, uint i, Tuple_ I)
69 : {
70 : uint r1;
71 : r1 = inds;
72 : Tuple_ x22_apply_tmp2662;
73 : x22_apply_tmp2662 = I;
74 : return (_sub2ind_10)((tail_9)(inds), L * r1, ind + (i - (uint){1}) * L);
75 : }
76 : // (GPUArrays._sub2ind, Tuple{Tuple{UInt32,UInt32},UInt32,UInt32,UInt32,UInt32})
77 : uint _sub2ind_12(uint2 inds, uint L, uint ind, uint i, uint I)
78 : {
79 : uint r1;
80 : r1 = inds.s0;
81 : uint x22_apply_tmp2661;
82 : x22_apply_tmp2661 = I;
83 : return (_sub2ind_11)((tail_7)(inds), L * r1, ind + (i - (uint){1}) * L, x22_apply_tmp2661, (Tuple_){0.0f});
84 : }
85 : // (GPUArrays.gpu_sub2ind, Tuple{Tuple{UInt32,UInt32},Tuple{UInt32,UInt32}})
86 : uint gpu_sub2ind_3(uint2 dims, uint2 I)
87 : {
88 : uint2 x22_apply_tmp2660;
89 : x22_apply_tmp2660 = I;
90 : return (_sub2ind_12)((uint2){dims}, (uint){1}, (uint){1}, x22_apply_tmp2660.s0, (uint){x22_apply_tmp2660.s1});
91 : }
92 : // ########################
93 : // Main inner function
94 : // (GPUArrays.copy_kernel!, (Float32, Transpiler.CLIntrinsics.CLArray{Float64,2}, Tuple{UInt32,UInt32}, Transpiler.CLIntrinsics.CLArray{Float64,2}, Tuple{UInt32,UInt32}, Tuple{UInt32,UInt32}, Tuple{UInt32,UInt32}, Tuple{UInt32,UInt32}, UInt32))
95 : __kernel void copy_kernel1_4(float state, __global float * restrict dest, uint2 dest_offsets, __global float * restrict src, uint2 src_offsets, uint2 shape, uint2 shape_dest, uint2 shape_source, uint length)
96 : {
97 : uint src_idx;
98 : uint dest_idx;
99 : uint2 idx;
100 : uint i;
101 : i = (linear_index_1)(dest, state);
102 : if(i <= length){
103 : idx = (gpu_ind2sub_2)(shape, i);
104 : dest_idx = (gpu_sub2ind_3)(shape_dest, idx + dest_offsets);
105 : src_idx = (gpu_sub2ind_3)(shape_source, idx + src_offsets);
106 : ;
107 : float _ssavalue_0;
108 : _ssavalue_0 = (src)[src_idx - 0x00000001];
109 : (dest)[dest_idx - 0x00000001] = _ssavalue_0;
110 : ;
111 : };
112 : ;
113 : }
114 :
With following build error:
stringInput.cl:5:13: error: taking address of function is not allowed
stringInput.cl:9:25: error: variable in constant address space must be initialized
stringInput.cl:26:13: error: taking address of function is not allowed
stringInput.cl:40:26: error: taking address of function is not allowed
stringInput.cl:40:39: error: taking address of function is not allowed
stringInput.cl:46:13: error: taking address of function is not allowed
stringInput.cl:60:13: error: taking address of function is not allowed
stringInput.cl:74:13: error: taking address of function is not allowed
stringInput.cl:74:27: error: taking address of function is not allowed
stringInput.cl:83:13: error: taking address of function is not allowed
stringInput.cl:83:27: error: taking address of function is not allowed
stringInput.cl:90:13: error: taking address of function is not allowed
stringInput.cl:101:10: error: taking address of function is not allowed
stringInput.cl:103:16: error: taking address of function is not allowed
stringInput.cl:104:21: error: taking address of function is not allowed
stringInput.cl:105:20: error: taking address of function is not allowed
ERROR: CLError(code=-11, CL_BUILD_PROGRAM_FAILURE)
Stacktrace:
[1] macro expansion at /home/carlo/.julia/v0.6/OpenCL/src/macros.jl:6 [inlined]
[2] #build!#113(::String, ::Bool, ::Function, ::OpenCL.cl.Program) at /home/carlo/.julia/v0.6/OpenCL/src/program.jl:101
[3] (::OpenCL.cl.#kw##build!)(::Array{Any,1}, ::OpenCL.cl.#build!, ::OpenCL.cl.Program) at ./<missing>:0
[4] (::Transpiler.##41#42{Tuple{Float32,GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32},GPUArrays.#copy_kernel!,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{9,DataType}})() at /home/carlo/.julia/v0.6/Transpiler/src/clike/opencl/compilation.jl:96
[5] get!(::Transpiler.##41#42{Tuple{Float32,GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32},GPUArrays.#copy_kernel!,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{9,DataType}}, ::Dict{Any,Transpiler.CLFunction}, ::Tuple{GPUArrays.#copy_kernel!,NTuple{9,DataType}}) at ./dict.jl:449
[6] Transpiler.CLFunction(::Function, ::Tuple{Float32,GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32}, ::OpenCL.cl.CmdQueue) at /home/carlo/.julia/v0.6/Transpiler/src/clike/opencl/compilation.jl:77
[7] gpu_call(::Function, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::Tuple{GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32}, ::Int64, ::Void) at /home/carlo/.julia/v0.6/GPUArrays/src/backends/opencl/opencl.jl:220
[8] copy!(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::CartesianRange{CartesianIndex{2}}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::CartesianRange{CartesianIndex{2}}) at /home/carlo/.julia/v0.6/GPUArrays/src/abstractarray.jl:329
[9] copy!(::Array{Float64,2}, ::CartesianRange{CartesianIndex{2}}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::CartesianRange{CartesianIndex{2}}) at /home/carlo/.julia/v0.6/GPUArrays/src/abstractarray.jl:360
[10] getindex(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::Int64, ::Int64) at /home/carlo/.julia/v0.6/GPUArrays/src/abstractarray.jl:401
[11] macro expansion at ./REPL.jl:97 [inlined]
[12] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
and analogously
julia> v * v
Couldn't compile kernel:
1 : // dependencies
2 : // (GPUArrays.linear_index, Tuple{Transpiler.CLIntrinsics.CLArray{Float64,2},Float32})
3 : uint linear_index_1(__global float * restrict x2unused2, float state)
4 : {
5 : return (get_global_id)(0) + (uint){1};
6 : }
7 : // Type{UInt32}
8 : typedef int Type3UInt324; // placeholder type instance
9 : __constant Type3UInt324 TYP_INST_Type3UInt324;
10 :
11 : // (GPUArrays._ind2sub, Tuple{Tuple{UInt32},UInt32})
12 : uint _ind2sub_5(uint indslast, uint ind)
13 : {
14 : return (uint){ind + (uint){1}};
15 : }
16 : // (Base.argtail, Tuple{UInt32,UInt32})
17 : uint argtail_6(uint x, uint rest)
18 : {
19 : return rest;
20 : }
21 : // (Base.tail, Tuple{Tuple{UInt32,UInt32}})
22 : uint tail_7(uint2 x)
23 : {
24 : uint2 x22_apply_tmp2671;
25 : x22_apply_tmp2671 = x;
26 : return (argtail_6)(x22_apply_tmp2671.s0, (uint){x22_apply_tmp2671.s1});
27 : }
28 : // (GPUArrays._ind2sub, Tuple{Tuple{UInt32,UInt32},UInt32})
29 : uint2 _ind2sub_2(uint2 inds, uint ind)
30 : {
31 : uint l;
32 : uint f;
33 : uint indnext;
34 : uint r1;
35 : r1 = inds.s0;
36 : indnext = ind / r1;
37 : f = (uint){1};
38 : l = r1;
39 : uint x22_apply_tmp2670;
40 : x22_apply_tmp2670 = (_ind2sub_5)((tail_7)(inds), indnext);
41 : return (uint2){(ind - l * indnext) + f, x22_apply_tmp2670};
42 : }
43 : // (GPUArrays.gpu_ind2sub, Tuple{Tuple{UInt32,UInt32},UInt32})
44 : uint2 gpu_ind2sub_2(uint2 dims, uint ind)
45 : {
46 : return (_ind2sub_2)((uint2){dims}, ind - (uint){1});
47 : }
48 : // Tuple{}
49 : typedef int Tuple_; // empty type emitted as an int
50 : // (Base.argtail, Tuple{UInt32})
51 : Tuple_ argtail_8(uint x, Tuple_ rest)
52 : {
53 : return rest;
54 : }
55 : // (Base.tail, Tuple{Tuple{UInt32}})
56 : Tuple_ tail_9(uint x)
57 : {
58 : uint x22_apply_tmp2675;
59 : x22_apply_tmp2675 = x;
60 : return (argtail_8)(x22_apply_tmp2675, (Tuple_){0.0f});
61 : }
62 : // (GPUArrays._sub2ind, Tuple{Tuple{},UInt32,UInt32})
63 : uint _sub2ind_10(Tuple_ x, uint L, uint ind)
64 : {
65 : return ind;
66 : }
67 : // (GPUArrays._sub2ind, Tuple{Tuple{UInt32},UInt32,UInt32,UInt32})
68 : uint _sub2ind_11(uint inds, uint L, uint ind, uint i, Tuple_ I)
69 : {
70 : uint r1;
71 : r1 = inds;
72 : Tuple_ x22_apply_tmp2674;
73 : x22_apply_tmp2674 = I;
74 : return (_sub2ind_10)((tail_9)(inds), L * r1, ind + (i - (uint){1}) * L);
75 : }
76 : // (GPUArrays._sub2ind, Tuple{Tuple{UInt32,UInt32},UInt32,UInt32,UInt32,UInt32})
77 : uint _sub2ind_12(uint2 inds, uint L, uint ind, uint i, uint I)
78 : {
79 : uint r1;
80 : r1 = inds.s0;
81 : uint x22_apply_tmp2673;
82 : x22_apply_tmp2673 = I;
83 : return (_sub2ind_11)((tail_7)(inds), L * r1, ind + (i - (uint){1}) * L, x22_apply_tmp2673, (Tuple_){0.0f});
84 : }
85 : // (GPUArrays.gpu_sub2ind, Tuple{Tuple{UInt32,UInt32},Tuple{UInt32,UInt32}})
86 : uint gpu_sub2ind_3(uint2 dims, uint2 I)
87 : {
88 : uint2 x22_apply_tmp2672;
89 : x22_apply_tmp2672 = I;
90 : return (_sub2ind_12)((uint2){dims}, (uint){1}, (uint){1}, x22_apply_tmp2672.s0, (uint){x22_apply_tmp2672.s1});
91 : }
92 : // ########################
93 : // Main inner function
94 : // (GPUArrays.copy_kernel!, (Float32, Transpiler.CLIntrinsics.CLArray{Float64,2}, Tuple{UInt32,UInt32}, Transpiler.CLIntrinsics.CLArray{Float64,2}, Tuple{UInt32,UInt32}, Tuple{UInt32,UInt32}, Tuple{UInt32,UInt32}, Tuple{UInt32,UInt32}, UInt32))
95 : __kernel void copy_kernel1_4(float state, __global float * restrict dest, uint2 dest_offsets, __global float * restrict src, uint2 src_offsets, uint2 shape, uint2 shape_dest, uint2 shape_source, uint length)
96 : {
97 : uint src_idx;
98 : uint dest_idx;
99 : uint2 idx;
100 : uint i;
101 : i = (linear_index_1)(dest, state);
102 : if(i <= length){
103 : idx = (gpu_ind2sub_2)(shape, i);
104 : dest_idx = (gpu_sub2ind_3)(shape_dest, idx + dest_offsets);
105 : src_idx = (gpu_sub2ind_3)(shape_source, idx + src_offsets);
106 : ;
107 : float _ssavalue_0;
108 : _ssavalue_0 = (src)[src_idx - 0x00000001];
109 : (dest)[dest_idx - 0x00000001] = _ssavalue_0;
110 : ;
111 : };
112 : ;
113 : }
114 :
With following build error:
stringInput.cl:5:13: error: taking address of function is not allowed
stringInput.cl:9:25: error: variable in constant address space must be initialized
stringInput.cl:26:13: error: taking address of function is not allowed
stringInput.cl:40:26: error: taking address of function is not allowed
stringInput.cl:40:39: error: taking address of function is not allowed
stringInput.cl:46:13: error: taking address of function is not allowed
stringInput.cl:60:13: error: taking address of function is not allowed
stringInput.cl:74:13: error: taking address of function is not allowed
stringInput.cl:74:27: error: taking address of function is not allowed
stringInput.cl:83:13: error: taking address of function is not allowed
stringInput.cl:83:27: error: taking address of function is not allowed
stringInput.cl:90:13: error: taking address of function is not allowed
stringInput.cl:101:10: error: taking address of function is not allowed
stringInput.cl:103:16: error: taking address of function is not allowed
stringInput.cl:104:21: error: taking address of function is not allowed
stringInput.cl:105:20: error: taking address of function is not allowed
ERROR: CLError(code=-11, CL_BUILD_PROGRAM_FAILURE)
Stacktrace:
[1] macro expansion at /home/carlo/.julia/v0.6/OpenCL/src/macros.jl:6 [inlined]
[2] #build!#113(::String, ::Bool, ::Function, ::OpenCL.cl.Program) at /home/carlo/.julia/v0.6/OpenCL/src/program.jl:101
[3] (::OpenCL.cl.#kw##build!)(::Array{Any,1}, ::OpenCL.cl.#build!, ::OpenCL.cl.Program) at ./<missing>:0
[4] (::Transpiler.##41#42{Tuple{Float32,GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32},GPUArrays.#copy_kernel!,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{9,DataType}})() at /home/carlo/.julia/v0.6/Transpiler/src/clike/opencl/compilation.jl:96
[5] get!(::Transpiler.##41#42{Tuple{Float32,GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32},GPUArrays.#copy_kernel!,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{9,DataType}}, ::Dict{Any,Transpiler.CLFunction}, ::Tuple{GPUArrays.#copy_kernel!,NTuple{9,DataType}}) at ./dict.jl:449
[6] Transpiler.CLFunction(::Function, ::Tuple{Float32,GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32}, ::OpenCL.cl.CmdQueue) at /home/carlo/.julia/v0.6/Transpiler/src/clike/opencl/compilation.jl:77
[7] gpu_call(::Function, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::Tuple{GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},Tuple{UInt32,UInt32},UInt32}, ::Int64, ::Void) at /home/carlo/.julia/v0.6/GPUArrays/src/backends/opencl/opencl.jl:220
[8] copy!(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::CartesianRange{CartesianIndex{2}}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::CartesianRange{CartesianIndex{2}}) at /home/carlo/.julia/v0.6/GPUArrays/src/abstractarray.jl:329
[9] copy!(::Array{Float64,2}, ::CartesianRange{CartesianIndex{2}}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::CartesianRange{CartesianIndex{2}}) at /home/carlo/.julia/v0.6/GPUArrays/src/abstractarray.jl:360
[10] getindex(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::Int64, ::Int64) at /home/carlo/.julia/v0.6/GPUArrays/src/abstractarray.jl:401
[11] matmul2x2!(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::Char, ::Char, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}) at ./linalg/matmul.jl:670
[12] gemm_wrapper!(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::Char, ::Char, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}) at ./linalg/matmul.jl:360
[13] A_mul_B!(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}) at ./linalg/matmul.jl:148
[14] *(::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float64,2,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}) at ./linalg/matmul.jl:146
[15] macro expansion at ./REPL.jl:97 [inlined]
[16] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
I get the following error when running using GPUArrays
for the first time. Any idea what is wrong?
julia> using GPUArrays
INFO: Precompiling module GPUArrays.
ERROR: LoadError: LoadError: LoadError: LoadError: UndefVarError: CLTranspiler not defined
Stacktrace:
[1] include_from_node1(::String) at ./loading.jl:569
[2] include(::String) at ./sysimg.jl:14
[3] include_from_node1(::String) at ./loading.jl:569
[4] include(::String) at ./sysimg.jl:14
[5] include_from_node1(::String) at ./loading.jl:569
[6] include(::String) at ./sysimg.jl:14
[7] include_from_node1(::String) at ./loading.jl:569
[8] include(::String) at ./sysimg.jl:14
[9] anonymous at ./<missing>:2
while loading /home/mohd/.julia/v0.6/GPUArrays/src/backends/opencl/opencl.jl, in expression starting on line 17
while loading /home/mohd/.julia/v0.6/GPUArrays/src/backends/supported_backends.jl, in expression starting on line 9
while loading /home/mohd/.julia/v0.6/GPUArrays/src/backends/backends.jl, in expression starting on line 52
while loading /home/mohd/.julia/v0.6/GPUArrays/src/GPUArrays.jl, in expression starting on line 10
ERROR: Failed to precompile GPUArrays to /home/mohd/.julia/lib/v0.6/GPUArrays.ji.
Stacktrace:
[1] compilecache(::String) at ./loading.jl:703
[2] _require(::Symbol) at ./loading.jl:490
[3] require(::Symbol) at ./loading.jl:398
Hi,
I want to take advantage of the threaded broadcast of your package but I ran into this:
using GPUArrays
a=JLArray(rand(10))
which gives:
GPU: 10-element Array{Float64,1}:
0.709778
0.691078
0.974742
0.546105
0.962552
0.737111
0.297245
0.985732
0.495128
0.496053
Tests pass for the CL context, but I am not sure where to go with this:
using GPUArrays
u0 = GPUArray(rand(Float32, 32, 32))
abs.(u0)
Couldn't compile kernel:
1 : // dependant type declarations
2 : typedef struct {
3 : float empty; // structs can't be empty
4 : }Base21abs;
5 :
6 : // dependant function declarations
7 : float broadcast_index_2(__global float * restrict arg, int2 shape, int i)
8 : {
9 : ;
10 : return arg[i - 1];
11 : ;
12 : }
13 : // Main inner function
14 : __kernel void broadcast_kernel_7(__global float * restrict A, Base21abs f, int2 sz, __global float * restrict arg_1)
15 : {
16 : int i;
17 : i = get_global_id(0) + 1;
18 : float _ssavalue_0;
19 : _ssavalue_0 = abs(broadcast_index_2(arg_1, sz, i));
20 : A[i - 1] = _ssavalue_0;
21 : ;
22 : }
23 :
With following build error:
<kernel>:19:19: error: call to 'abs' is ambiguous
_ssavalue_0 = abs(broadcast_index_2(arg_1, sz, i));
^~~
cl_kernel.h:2310:24: note: candidate function
uchar __OVERLOADABLE__ abs(char);
^
cl_kernel.h:2311:24: note: candidate function
uchar __OVERLOADABLE__ abs(uchar);
^
cl_kernel.h:2312:25: note: candidate function
ushort __OVERLOADABLE__ abs(short);
^
cl_kernel.h:2313:25: note: candidate function
ushort __OVERLOADABLE__ abs(ushort);
^
cl_kernel.h:2314:23: note: candidate function
uint __OVERLOADABLE__ abs(int);
^
cl_kernel.h:2315:23: note: candidate function
uint __OVERLOADABLE__ abs(uint);
^
cl_kernel.h:2316:24: note: candidate function
ulong __OVERLOADABLE__ abs(long);
^
cl_kernel.h:2317:24: note: candidate function
ulong __OVERLOADABLE__ abs(ulong);
^
cl_kernel.h:2319:25: note: candidate function
uchar2 __OVERLOADABLE__ abs(char2);
^
cl_kernel.h:2321:25: note: candidate function
uchar3 __OVERLOADABLE__ abs(char3);
^
cl_kernel.h:2323:25: note: candidate function
uchar4 __OVERLOADABLE__ abs(char4);
^
cl_kernel.h:2324:25: note: candidate function
uchar8 __OVERLOADABLE__ abs(char8);
^
cl_kernel.h:2325:26: note: candidate function
uchar16 __OVERLOADABLE__ abs(char16);
^
cl_kernel.h:2326:25: note: candidate function
uchar2 __OVERLOADABLE__ abs(uchar2);
^
cl_kernel.h:2328:25: note: candidate function
uchar3 __OVERLOADABLE__ abs(uchar3);
^
cl_kernel.h:2330:25: note: candidate function
uchar4 __OVERLOADABLE__ abs(uchar4);
^
cl_kernel.h:2331:25: note: candidate function
uchar8 __OVERLOADABLE__ abs(uchar8);
^
cl_kernel.h:2332:26: note: candidate function
uchar16 __OVERLOADABLE__ abs(uchar16);
^
cl_kernel.h:2333:26: note: candidate function
ushort2 __OVERLOADABLE__ abs(short2);
^
cl_kernel.h:2335:26: note: candidate function
ushort3 __OVERLOADABLE__ abs(short3);
^
cl_kernel.h:2337:26: note: candidate function
ushort4 __OVERLOADABLE__ abs(short4);
^
cl_kernel.h:2338:26: note: candidate function
ushort8 __OVERLOADABLE__ abs(short8);
^
cl_kernel.h:2339:27: note: candidate function
ushort16 __OVERLOADABLE__ abs(short16);
^
cl_kernel.h:2340:26: note: candidate function
ushort2 __OVERLOADABLE__ abs(ushort2);
^
cl_kernel.h:2342:26: note: candidate function
ushort3 __OVERLOADABLE__ abs(ushort3);
^
cl_kernel.h:2344:26: note: candidate function
ushort4 __OVERLOADABLE__ abs(ushort4);
^
cl_kernel.h:2345:26: note: candidate function
ushort8 __OVERLOADABLE__ abs(ushort8);
^
cl_kernel.h:2346:27: note: candidate function
ushort16 __OVERLOADABLE__ abs(ushort16);
^
cl_kernel.h:2347:24: note: candidate function
uint2 __OVERLOADABLE__ abs(int2);
^
cl_kernel.h:2349:24: note: candidate function
uint3 __OVERLOADABLE__ abs(int3);
^
cl_kernel.h:2351:24: note: candidate function
uint4 __OVERLOADABLE__ abs(int4);
^
cl_kernel.h:2352:24: note: candidate function
uint8 __OVERLOADABLE__ abs(int8);
^
cl_kernel.h:2353:25: note: candidate function
uint16 __OVERLOADABLE__ abs(int16);
^
cl_kernel.h:2354:24: note: candidate function
uint2 __OVERLOADABLE__ abs(uint2);
^
cl_kernel.h:2356:24: note: candidate function
uint3 __OVERLOADABLE__ abs(uint3);
^
cl_kernel.h:2358:24: note: candidate function
uint4 __OVERLOADABLE__ abs(uint4);
^
cl_kernel.h:2359:24: note: candidate function
uint8 __OVERLOADABLE__ abs(uint8);
^
cl_kernel.h:2360:25: note: candidate function
uint16 __OVERLOADABLE__ abs(uint16);
^
cl_kernel.h:2361:25: note: candidate function
ulong2 __OVERLOADABLE__ abs(long2);
^
cl_kernel.h:2363:25: note: candidate function
ulong3 __OVERLOADABLE__ abs(long3);
^
cl_kernel.h:2365:25: note: candidate function
ulong4 __OVERLOADABLE__ abs(long4);
^
cl_kernel.h:2366:25: note: candidate function
ulong8 __OVERLOADABLE__ abs(long8);
^
cl_kernel.h:2367:26: note: candidate function
ulong16 __OVERLOADABLE__ abs(long16);
^
cl_kernel.h:2368:25: note: candidate function
ulong2 __OVERLOADABLE__ abs(ulong2);
^
cl_kernel.h:2370:25: note: candidate function
ulong3 __OVERLOADABLE__ abs(ulong3);
^
cl_kernel.h:2372:25: note: candidate function
ulong4 __OVERLOADABLE__ abs(ulong4);
^
cl_kernel.h:2373:25: note: candidate function
ulong8 __OVERLOADABLE__ abs(ulong8);
^
cl_kernel.h:2374:26: note: candidate function
ulong16 __OVERLOADABLE__ abs(ulong16);
^
CLError(code=-11, CL_BUILD_PROGRAM_FAILURE)
in broadcast at GPUArrays\src\abstractarray.jl:186
in acc_broadcast! at GPUArrays\src\backends\opencl\opencl.jl:168
in Type at Transpiler\src\clike/opencl\compilation.jl:79
in get! at base\dict.jl:449
in #27 at Transpiler\src\clike/opencl\compilation.jl:104
in #build! at base\<missing>
in #build!#113 at OpenCL\src\program.jl:101
in macro expansion at OpenCL\src\macros.jl:6
We rely on Base to route A * B calls to the correct BLAS routines.
But base actually reroutes small matrices to a julia implementation here:
https://github.com/JuliaLang/julia/blob/master/base/linalg/matmul.jl#L336
We need to overload this and change the behaviour for GPUArrays.
I'm almost inclined to throw an error, since no one should actually multiply a 4x4 matrix on the GPU like that.
Ref: #44
u0 = GPUArray(rand(Float32, 32, 32))
u0[1]
This was surprising. I'm guessing it's an issue with limited memory:
julia> using GPUArrays
julia> CUBackend.init()
CUContext
julia> x = GPUArray(rand(Float32, 100000));
julia> @time x .= sqrt.(x);
9.759133 seconds (3.52 M allocations: 192.281 MiB, 1.01% gc time)
julia> @time x .= sqrt.(x);
0.000380 seconds (36 allocations: 944 bytes)
julia> y = GPUArray(rand(Float32, 100000));
julia> @time atan2.(y,x);
0.480490 seconds (279.18 k allocations: 15.781 MiB, 1.60% gc time)
julia> @time atan2.(y,x);
ERROR: Illegal memory access (CUDA error #700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] macro expansion at /Users/solver/.julia/v0.6/CUDAdrv/src/base.jl:157 [inlined]
[2] upload(error in running finalizer: CUDAdrv.CuError(code=700, info=Base.Nullable{String}(hasvalue=false, value=#<null>))
::CUDAdrv.DevicePtr{CUDAnative.CuDeviceArray{Float32,1}}, ::Base.RefValue{CUDAnative.CuDeviceArray{Float32,1}}, ::Int64) at /Users/solver/.julia/v0.6/CUDAdrv/src/memory.jl:48
[3] upload(::CUDAdrv.DevicePtr{CUDAnative.CuDeviceArray{Float32,1}}, ::CUDAnative.CuDeviceArray{Float32,1}) at /Users/solver/.julia/v0.6/CUDAdrv/src/memory.jl:98
[4] macro expansion at /Users/solver/.julia/v0.6/CUDAnative/src/execution.jl:122 [inlined]
[5] generated_cuda at /Users/solver/.julia/v0.6/CUDAnative/src/execution.jl:185 [inlined]
[6] call_cuda at /Users/solver/.julia/v0.6/GPUArrays/src/backends/cudanative/cudanative.jl:102 [inlined]
[7] acc_broadcast!(::Base.Math.#atan2, ::GPUArrays.GPUArray{Float32,1,CUDAdrv.CuArray{Float32,1},GPUArrays.CUBackend.CUContext}, ::Tuple{GPUArrays.GPUArray{Float32,1,CUDAdrv.CuArray{Float32,1},GPUArrays.CUBackend.CUContext},GPUArrays.GPUArray{Float32,1,CUDAdrv.CuArray{Float32,1},GPUArrays.CUBackend.CUContext}}) at /Users/solver/.julia/v0.6/GPUArrays/src/backends/cudanative/cudanative.jl:227
[8] broadcast(::Function, ::GPUArrays.GPUArray{Float32,1,CUDAdrv.CuArray{Float32,1},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,1,CUDAdrv.CuArray{Float32,1},GPUArrays.CUBackend.CUContext}) at /Users/solver/.julia/v0.6/GPUArrays/src/abstractarray.jl:196
julia> using GPUArrays
julia> GPUArray{Float32}(10,10)
ERROR: MethodError: no method matching similar(::Type{GPUArrays.GPUArray{Float32,N} where N}, ::Type{Float32}, ::Tuple{Int64,Int64})
Closest candidates are:
similar(::Array, ::Type, ::Tuple{Vararg{Int64,N}}) where N at array.jl:194
similar(::SubArray, ::Type, ::Tuple{Vararg{Int64,N}} where N) at subarray.jl:58
similar(::Base.ReshapedArray, ::Type, ::Tuple{Vararg{Int64,N}} where N) at reshapedarray.jl:167
...
Stacktrace:
[1] GPUArrays.GPUArray{Float32,N} where N(::Int64, ::Int64) at /Users/solver/.julia/v0.6/GPUArrays/src/construction.jl:37
julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin16.7.0)
CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
WORD_SIZE: 64
BLAS: libgfortblas
LAPACK: liblapack
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
I got the following error when I tried simple transposing a gpuarray. I was using the most recent GPUArray and CUDAnative, Julia 0.6 and mac os 10.12.6.
julia> b = z';
ERROR: MethodError: no method matching copy!(::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::CartesianRange{CartesianIndex{2}}, ::Array{Float64,1}, ::CartesianRange{CartesianIndex{1}})
Closest candidates are:
copy!(::GPUArrays.AbstractAccArray{T,N} where N, ::CartesianRange{CartesianIndex{1}}, ::AbstractArray{T,N} where N, ::CartesianRange{CartesianIndex{1}}) where T at /Users/shhong/.julia/v0.6/GPUArrays/src/abstractarray.jl:271
copy!(::GPUArrays.AbstractAccArray, ::Integer, ::AbstractArray, ::Integer, ::Integer) at /Users/shhong/.julia/v0.6/GPUArrays/src/abstractarray.jl:254
copy!(::AbstractArray, ::Integer, ::AbstractArray) at abstractarray.jl:672
...
Stacktrace:
[1] setindex!(::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Float64, ::Int64, ::Int64) at /Users/shhong/.julia/v0.6/GPUArrays/src/abstractarray.jl:382
[2] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:68
[3] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:74
[4] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:78
[5] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:74
[6] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:78
[7] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:74
[8] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:78
[9] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:74
[10] transposeblock!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::Int64, ::Int64, ::Int64) at ./linalg/transpose.jl:78
[11] transpose_f!(::Base.#transpose, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at ./linalg/transpose.jl:59
[12] transpose!(::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at ./linalg/transpose.jl:16
[13] transpose(::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at ./linalg/transpose.jl:121
[14] ctranspose(::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at ./linalg/transpose.jl:130
The following code:
A = GPUArray(randn(Float32, 3, 2))
n = 2
A .^ n
gives an error:
ERROR: LLVM error: Cannot select: 0xf179100: f32 = fpow 0xf178f40, 0xf179090
0xf178f40: f32,ch = load<LD4[%30](tbaa=<0xd4c1048>)> 0xf176c80, 0xf177930, undef:i64
0xf177930: i64 = add 0xf1777e0, Constant:i64<-4>
0xf1777e0: i64 = add 0xf174cd0, 0xf1792c0
0xf174cd0: i64,ch = CopyFromReg 0xeb554a0:1, Register:i64 %vreg15
0xeb55740: i64 = Register %vreg15
0xf1792c0: i64 = NVPTXISD::MUL_WIDE_UNSIGNED 0xf1773f0, Constant:i32<4>
0xf1773f0: i32 = NVPTXISD::IMAD 0xf1772a0, 0xeb557b0, 0xeb54c50
0xf1772a0: i32 = add 0xf1771c0, Constant:i32<-1>
0xf1771c0: i32 = select 0xf176f20, 0xeb555f0, 0xf178d10
0xf176f20: i1 = xor 0xf177310, Constant:i1<-1>
0xf177310: i1 = truncate 0xf1751a0
0xf176eb0: i1 = Constant<-1>
0xeb555f0: i32,ch = CopyFromReg 0xeb54fd0:1, Register:i32 %vreg10
0xeb55190: i32 = Register %vreg10
0xf178d10: i32,ch,glue = NVPTXISD::LoadParam<LDST4[<unknown>]> 0xf1775b0:1, Constant:i32<1>, Constant:i32<4>, 0xf1775b0:2
0xf176e40: i32 = Constant<1>
0xf175a60: i32 = Constant<4>
0xf1775b0: i32,ch,glue = NVPTXISD::LoadParam<LDST4[<unknown>]> 0xf177690, Constant:i32<1>, Constant:i32<0>, 0xf177690:1
0xf177230: i32 = Constant<-1>
0xeb557b0: i32,ch = CopyFromReg 0xeb59f50, Register:i32 %vreg1
0xeb55820: i32 = Register %vreg1
0xeb54c50: i32 = select 0xeb54b70, 0xeb54fd0, 0xf1775b0
0xeb54b70: i1 = xor 0xf175600, Constant:i1<-1>
0xf175600: i1 = truncate 0xf174f00
0xf174f00: i32,ch,glue = NVPTXISD::LoadParam<LDST4[<unknown>](align=1)> 0xf174fe0, Constant:i32<1>, Constant:i32<0>, 0xf174fe0:1
0xf176eb0: i1 = Constant<-1>
0xeb54fd0: i32,ch = CopyFromReg 0xeb59f50, Register:i32 %vreg9
0xf177540: i32 = Register %vreg9
0xf1775b0: i32,ch,glue = NVPTXISD::LoadParam<LDST4[<unknown>]> 0xf177690, Constant:i32<1>, Constant:i32<0>, 0xf177690:1
0xf176e40: i32 = Constant<1>
0xf175130: i32 = Constant<0>
0xf177690: ch,glue = NVPTXISD::CallArgEnd 0xf177770, Constant:i32<1>, 0xf177770:1
0xf176e40: i32 = Constant<1>
0xf177770: ch,glue = NVPTXISD::LastCallArg 0xf1756e0, Constant:i32<1>, Constant:i32<1>, 0xf1756e0:1
0xf175a60: i32 = Constant<4>
0xf178c30: i64 = Constant<-4>
0xf176dd0: i64 = undef
0xf179090: f32 = sint_to_fp 0xf179020
0xf179020: i64,ch = CopyFromReg 0xeb59f50, Register:i64 %vreg16
0xf178fb0: i64 = Register %vreg16
In function: ptxcall_broadcast_kernel__61772
Stacktrace:
[1] handle_error(::Cstring) at /home/dfdx/.julia/v0.6/LLVM/src/core/context.jl:96
[2] macro expansion at /home/dfdx/.julia/v0.6/LLVM/src/util/logging.jl:102 [inlined]
[3] macro expansion at /home/dfdx/.julia/v0.6/LLVM/src/base.jl:20 [inlined]
[4] LLVMTargetMachineEmitToMemoryBuffer(::Ptr{LLVM.API.LLVMOpaqueTargetMachine}, ::Ptr{LLVM.API.LLVMOpaqueModule}, ::UInt32, ::Base.RefValue{Cstring}, ::Base.RefValue{Ptr{LLVM.API.LLVMOpaqueMemoryBuffer}}) at /home/dfdx/.julia/v0.6/LLVM/src/../lib/3.9/libLLVM_h.jl:301
[5] emit(::LLVM.TargetMachine, ::LLVM.Module, ::UInt32) at /home/dfdx/.julia/v0.6/LLVM/src/targetmachine.jl:39
[6] #mcgen#46(::Bool, ::Function, ::LLVM.Module, ::LLVM.Function, ::VersionNumber) at /home/dfdx/.julia/v0.6/CUDAnative/src/jit.jl:296
[7] (::CUDAnative.#kw##mcgen)(::Array{Any,1}, ::CUDAnative.#mcgen, ::LLVM.Module, ::LLVM.Function, ::VersionNumber) at ./<missing>:0
[8] #compile_function#47(::Bool, ::Function, ::Any, ::Any, ::VersionNumber) at /home/dfdx/.julia/v0.6/CUDAnative/src/jit.jl:319
[9] cufunction(::CUDAdrv.CuDevice, ::Any, ::Any) at /home/dfdx/.julia/v0.6/CUDAnative/src/jit.jl:356
[10] macro expansion at /home/dfdx/.julia/v0.6/CUDAnative/src/execution.jl:106 [inlined]
[11] _cuda(::Tuple{Int64,Int64}, ::Int64, ::CUDAdrv.CuStream, ::GPUArrays.#broadcast_kernel!, ::Float32, ::Base.#^, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Tuple{UInt32,UInt32}, ::UInt32, ::Tuple{GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0}}, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Int64) at /home/dfdx/.julia/v0.6/CUDAnative/src/execution.jl:79
[12] gpu_call(::Function, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Tuple{Base.#^,GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext},Tuple{UInt32,UInt32},UInt32,Tuple{GPUArrays.BroadcastDescriptorN{Array,2},GPUArrays.BroadcastDescriptorN{Any,0}},GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext},Int64}, ::Int64, ::Void) at /home/dfdx/.julia/v0.6/GPUArrays/src/backends/cudanative/cudanative.jl:194
[13] _broadcast!(::Function, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Tuple{Tuple{Bool,Bool},Tuple{}}, ::Tuple{Tuple{Int64,Int64},Tuple{}}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Tuple{Int64}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{2}}) at /home/dfdx/.julia/v0.6/GPUArrays/src/broadcast.jl:66
[14] broadcast_t(::Function, ::Type{Float32}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{2}}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Int64) at /home/dfdx/.julia/v0.6/GPUArrays/src/broadcast.jl:33
[15] broadcast_c at ./broadcast.jl:314 [inlined]
[16] broadcast(::Function, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Int64) at ./broadcast.jl:434
Note that using a constant power, e.g. A .^ 2
, works fine.
Version info:
Julia Version 0.6.0
Commit 9036443 (2017-06-19 13:05 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
GPU: GeForce GTX 960M
CUDA 8.0
Initializing a backend such as opencl it selects the first device by default as seen in lines 29-38 of opencl.jl
https://github.com/JuliaGPU/GPUArrays.jl/blob/master/src/backends/opencl/opencl.jl
dev = if isempty(devlist)
devlist = cl.devices(:cpu)
if isempty(devlist)
error("no device found to be supporting opencl")
else
first(devlist)
end
else
first(devlist)
end
Can we add functionality to select from multiple supported devices when initializing. For example, most laptops with discrete graphics also have integrated graphics as an option. For these computers the first device is the integrated chip which is normally not what you want to use doing GPU acceleration.
This is surprising:
julia> x = GPUArray(rand(Float32, 1000000));
julia> @time x.=sqrt.(x);
0.000559 seconds (36 allocations: 944 bytes)
julia> @time x .= x.*x;
0.135318 seconds (52.30 k allocations: 3.387 MiB, 5.70% gc time)
julia> versioninfo()
Julia Version 0.6.0-rc1.0
Commit 6bdb3950bd (2017-05-07 00:00 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin16.5.0)
CPU: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
WORD_SIZE: 64
BLAS: libgfortblas
LAPACK: liblapack
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
Not sure if it's expected, but the following:
X = GPUArray(randn(Float32, 10, 10))
sum(X, 2)
gives an error:
ERROR: MethodError: no method matching indexlength(::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Int64, ::CartesianIndex{1})
Closest candidates are:
indexlength(::Any, ::Any, ::AbstractArray) at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:368
indexlength(::Any, ::Any, ::Number) at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:369
indexlength(::Any, ::Any, ::Colon) at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:370
Stacktrace:
[1] (::GPUArrays.##22#23{GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext},Tuple{Int64,CartesianIndex{1}}})(::Int64) at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:375
[2] ntuple at ./tuple.jl:128 [inlined]
[3] setindex!(::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Float32, ::Int64, ::CartesianIndex{1}) at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:374
[4] macro expansion at ./reducedim.jl:203 [inlined]
[5] macro expansion at ./simdloop.jl:73 [inlined]
[6] _mapreducedim!(::Base.#identity, ::Base.#+, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}) at ./reducedim.jl:202
[7] mapreducedim!(::Function, ::Function, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}) at ./reducedim.jl:210
[8] sum(::GPUArrays.GPUArray{Float32,2,CUDAdrv.CuArray{Float32,2},GPUArrays.CUBackend.CUContext}, ::Int64) at ./reducedim.jl:572
Hey, I was trying to install GPUArrays
. I have a working version of CUDAnative. But after a Pkg.clone I find I need to have to installed the GLAbstractions
and GLFW
etc., Is it possible to decouple the package from all the backends so the user can simply use the backend he cares about?
I am curious that how would the global compute_context be access and set from backend? in the case that backend is also a module.
In the documentation of julia:
http://docs.julialang.org/en/release-0.4/manual/variables-and-scoping/#global-scope
Each module introduces a new global scope, separate from the global scope of all other modules; there is no all-encompassing global scope
Base defines a vecnorm function that calls BLAS if the array is bigger than 32 elements.
CC @MikeInnes
setindex!
/ getindex
view or not?Brought up here: https://discourse.julialang.org/t/list-of-most-desired-features-for-julia-v1-x/4481/80?u=chrisrackauckas
The idea is to transform the loops to broadcast:
@gpu for i in 2:length(A)-1
A[i] = 2A[i] - A[i-1] - A[i+1]
end
That would be a really nice usability feature for writing stencils. Broadcast + views can be the implementation, right?
This is with CuArrays own indexing disabled.
Strangely plain arrays seem to work, but not cuarrays.
julia> cu([1:5;])[[5,2,3]]
3-element CuArray{Float32,1}:
5.0
2.0
3.0
julia> cu([1:5;])[cu[5,2,3]]
ERROR: scalar getindex is disabled
A_ldiv_B!
and \
support for GPUArrays is pretty essential and I keep assuming it already exists.
It seems like this is implemented, but commented out.
Running the latest tagged versions of CUDAdrv, CUDArt and GPUArrays, Pkg.test("GPUArrays")
yields (among other things):
WARNING: Couldn't find cuda compilation lib in default location.
This will disable the ability to compile a CUDA kernel from a string
To fix, install CUDAdrv in default location.
The warning (and subsequent test failure) is caused by this line in the backend definition, which refers to a file that was moved from CUDAdrv to CUDArt by this commit.
Should be a simple three character fix from
compile_lib = Pkg.dir("CUDAdrv", "examples", "compilation", "library.jl")
to
compile_lib = Pkg.dir("CUDArt", "examples", "compilation", "library.jl")
so didn't seem worthy of a PR.
Hello,
I installed the package with Pkg.add
but some of the tests don´t pass (julia0.6)
My output is:
julia> Pkg.test("GPUArrays")
INFO: Computing test dependencies for GPUArrays...
INFO: No packages to install, update or remove
INFO: Testing GPUArrays
Test Summary: | Pass Total
julia | 48 48
broadcast Complex64: Test Failed
Expression: ERROR (unhandled task failure): MethodError: no method matching unsafe_string(::Ptr{Void})
Closest candidates are:
unsafe_string(::Cstring) at c.jl:79
unsafe_string(::Union{Ptr{Int8}, Ptr{UInt8}}) at strings/string.jl:39
unsafe_string(::Union{Ptr{Int8}, Ptr{UInt8}}, ::Integer) at strings/string.jl:35
Stacktrace:
[1] macro expansion at /Users/davidbuchacaprats/.julia/v0.6/OpenCL/src/context.jl:148 [inlined]
[2] (::OpenCL.cl.##43#44)() at ./task.jl:335
all((x->begin
x == angle(10.0f0im)
end), Array(B))
Stacktrace:
[1] macro expansion at /Users/davidbuchacaprats/.julia/v0.6/GPUArrays/test/opencl.jl:36 [inlined]
[2] macro expansion at ./test.jl:856 [inlined]
[3] anonymous at ./<missing>:?
Test Summary: | Pass Fail Total
opencl | 44 1 45
broadcast Float32 | 5 5
broadcast Complex64 | 4 1 5
Custom kernel from Julia function | 1 1
Custom kernel from string function | 1 1
transpose | 1 1
mapreduce Float32 (4048,) | 4 4
mapreduce Int32 (4048,) | 4 4
mapreduce Float32 (1024, 1024) | 4 4
mapreduce Int32 (1024, 1024) | 4 4
mapreduce Float32 (77,) | 4 4
mapreduce Int32 (77,) | 4 4
mapreduce Float32 (1923, 209) | 4 4
mapreduce Int32 (1923, 209) | 4 4
ERROR: LoadError: Some tests did not pass: 44 passed, 1 failed, 0 errored, 0 broken.
while loading /Users/davidbuchacaprats/.julia/v0.6/GPUArrays/test/runtests.jl, in expression starting on line 24
==================================[ ERROR: GPUArrays ]==================================
failed process: Process(`/Applications/Julia-0.6.app/Contents/Resources/julia/bin/julia -Ccore2 -J/Applications/Julia-0.6.app/Contents/Resources/julia/lib/julia/sys.dylib --compile=yes --depwarn=yes --check-bounds=yes --code-coverage=none --color=yes --compilecache=yes /Users/davidbuchacaprats/.julia/v0.6/GPUArrays/test/runtests.jl`, ProcessExited(1)) [1]
========================================================================================
INFO: No packages to install, update or remove
ERROR: GPUArrays had test errors
Any ideas how to solve it ?
All test pass:
julia> Pkg.test("GPUArrays")
INFO: Computing test dependencies for GPUArrays...
INFO: No packages to install, update or remove
INFO: Testing GPUArrays
Test Summary: | Pass Total
julia | 48 48
Test Summary: | Pass Total
opencl | 51 51
Test Summary: | Pass Total
BLAS | 8 8
Test Summary: | Pass Total
Shared | 6 6
INFO: GPUArrays tests passed
INFO: No packages to install, update or remove
Code:
julia> x = GPUArray(ones(100,100,100)); y = GPUArray(randn(100,100,100)); z = x .* y;
Couldn't compile kernel:
1 : // dependant type declarations
2 : typedef struct {
3 : int field1;
4 : int field2;
5 : int field3;
6 : }int[3];
7 :
8 : typedef struct {
9 : float empty; // structs can't be empty
10 : }Base123;
11 :
12 : // dependant function declarations
13 : float broadcast_index_1(__global float * restrict arg, int[3] shape, int i)
14 : {
15 : ;
16 : return arg[i - 1];
17 : ;
18 : }
19 : // Main inner function
20 : __kernel void broadcast_kernel_2(__global float * restrict A, Base123 f, int[3] sz, __global float * restrict arg_1, __global float * restrict arg_2)
21 : {
22 : int i;
23 : i = get_global_id(0) + 1;
24 : float _ssavalue_0;
25 : _ssavalue_0 = broadcast_index_1(arg_1, sz, i) * broadcast_index_1(arg_2, sz, i);
26 : A[i - 1] = _ssavalue_0;
27 : ;
28 : }
29 :
With following build error:
<kernel>:6:2: error: expected ';' after struct
}int[3];
^
;
<kernel>:2:1: warning: typedef requires a name
typedef struct {
^~~~~~~
<kernel>:6:5: error: expected identifier or '('
}int[3];
^
<kernel>:13:64: error: expected ')'
float broadcast_index_1(__global float * restrict arg, int[3] shape, int i)
^
<kernel>:13:24: note: to match this '('
float broadcast_index_1(__global float * restrict arg, int[3] shape, int i)
^
<kernel>:13:60: error: parameter name omitted
float broadcast_index_1(__global float * restrict arg, int[3] shape, int i)
^
<kernel>:16:16: error: use of undeclared identifier 'i'
return arg[i - 1];
^
<kernel>:20:82: error: expected ')'
__kernel void broadcast_kernel_2(__global float * restrict A, Base123 f, int[3] sz, __global float * restrict arg_1, __global float * restrict arg_2)
^
<kernel>:20:33: note: to match this '('
__kernel void broadcast_kernel_2(__global float * restrict A, Base123 f, int[3] sz, __global float * restrict arg_1, __global float * restrict arg_2)
^
<kernel>:20:78: error: parameter name omitted
__kernel void broadcast_kernel_2(__global float * restrict A, Base123 f, int[3] sz, __global float * restrict arg_1, __global float * restrict arg_2)
^
<kernel>:25:37: error: use of undeclared identifier 'arg_1'
_ssavalue_0 = broadcast_index_1(arg_1, sz, i) * broadcast_index_1(arg_2, sz, i);
^
<kernel>:25:71: error: use of undeclared identifier 'arg_2'
_ssavalue_0 = broadcast_index_1(arg_1, sz, i) * broadcast_index_1(arg_2, sz, i);
^
�
ERROR: CLError(code=-11, CL_BUILD_PROGRAM_FAILURE)
Stacktrace:
[1] macro expansion at C:\Users\jecs\.julia\v0.6\OpenCL\src\macros.jl:6 [inlined]
[2] #build!#113(::String, ::Bool, ::Function, ::OpenCL.cl.Program) at C:\Users\jecs\.julia\v0.6\OpenCL\src\program.jl:101
[3] (::OpenCL.cl.#kw##build!)(::Array{Any,1}, ::OpenCL.cl.#build!, ::OpenCL.cl.Program) at .\<missing>:0
[4] (::Transpiler.CLTranspiler.##27#28{Tuple{GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Base.#*,Tuple{Int32,Int32,Int32},GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}},GPUArrays.CLBackend.#broadcast_kernel,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{5,DataType}})() at C:\Users\jecs\.julia\v0.6\Transpiler\src\clike/opencl\compilation.jl:104
[5] get!(::Transpiler.CLTranspiler.##27#28{Tuple{GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Base.#*,Tuple{Int32,Int32,Int32},GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}},GPUArrays.CLBackend.#broadcast_kernel,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{5,DataType}}, ::Dict{Any,Transpiler.CLTranspiler.CLFunction}, ::Tuple{GPUArrays.CLBackend.#broadcast_kernel,NTuple{5,DataType}}) at .\dict.jl:449
[6] Transpiler.CLTranspiler.CLFunction(::Function, ::Tuple{GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},Base.#*,Tuple{Int32,Int32,Int32},GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}}, ::OpenCL.cl.CmdQueue) at C:\Users\jecs\.julia\v0.6\Transpiler\src\clike/opencl\compilation.jl:79
[7] acc_broadcast!(::Base.#*, ::GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::Tuple{GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}}) at C:\Users\jecs\.julia\v0.6\GPUArrays\src\backends\opencl\opencl.jl:168
[8] broadcast(::Function, ::GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float64,3,OpenCL.cl.Buffer{Float64},GPUArrays.CLBackend.CLContext}) at C:\Users\jecs\.julia\v0.6\GPUArrays\src\abstractarray.jl:196
Version info:
Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, skylake)
GPUArrays.jl status:
julia> Pkg.status("GPUArrays")
- GPUArrays 0.0.2+ master
During the juliacon demo @SimonDanisch seemed you use a free
function. Yet that function is not imported when importing GPUArrays. I have seen issue #30 but I am making a separate issue only for the memory problem.
I am experiencing some problems when using GPUArrays in a notebook.
Out of curiosity I just copied the same code that was printing a nice The kernel appears to have died. It will restart automatically
, into a file, and it runs just fine.
This is the code
using GPUArrays
using BenchmarkTools
sizes = [x for x in 100:100:1000];
cpu_times = Dict()
gpu_times = Dict()
println("\nCPU times")
for s in sizes
X = rand(Float32,s,s);
X_result = zeros(X);
res_cpu = @elapsed A_mul_B!(X_result, X,X)
println("size: ", s, " x ", s, " seconds: ", res_cpu, " seconds")
#cpu_times[s] = mean(res_cpu.times)/10^6
end
println("\nGPU times")
for s in sizes
X = rand(Float32,s,s);
X_result = zeros(X);
X_gpu = GPUArray(X);
X_result_gpu = GPUArray(zeros(Float32,s,s));
res_gpu = @elapsed A_mul_B!(X_result_gpu, X_gpu, X_gpu)
println("size: ", s, " x ", s, " seconds: ", res_gpu, " seconds")
#gpu_times[s] = mean(res_gpu.times)/10^6
end
I really don't know what is internally doing GPUArrays in the for loop.
For example in the following code
s = 100
X = rand(Float32,s,s);
X_result = zeros(X);
X_gpu = GPUArray(X);
s = 200
X = rand(Float32,s,s);
X_result = zeros(X);
X_gpu = GPUArray(X);
Is the first array freed?
free
function?function power_method(M, v)
T = eltype(v)
for i in 1:100
v = M*v # repeatedly creates a new vector and destroys the old v
v ./= T(norm(v))
end
return v, T(norm(M*v)) / T(norm(v)) # or (M*v) ./ v
end
M = [2 1; 1 1.]
v = [1., 1]
MM = GPUArray(M)
vv = GPUArray(v)
# OpenCL backend on a K80
vec, val = power_method(MM, vv)
# => ([0.000285175, 2.02974], 1.4145115942187871)
vec, val = power_method(M, v)
# => ([0.850651, 0.525731], 2.618033988749895)
using GPUArrays
GPUArrays.init(:cudanative)
x = GPUArray(rand(100, 200))
log.(x)
gives
GPUArray with ctx: CUDAnative context with:
Device: CU GeForce GTX 960M
threads: 1024
blocks: (1024, 1024, 64)
global_memory: 2100.232192 mb
free_global_memory: 1274.478592 mb
local_memory: 0.065536 mb
:
Error showing value of type GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}:
ERROR: CUDA error: an illegal memory access was encountered (code #700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] macro expansion at /home/dfdx/.julia/v0.6/CUDAdrv/src/base.jl:130 [inlined]
[2] download(::Base.RefArray{Float64,Array{Float64,2},Void}, ::CUDAdrv.OwnedPtr{Float64}, ::Int64) at /home/dfdx/.julia/v0.6/CUDAdrv/src/memory.jl:141
[3] copy!(::Array{Float64,2}, ::Int64, ::CUDAdrv.CuArray{Float64,2}, ::Int64, ::Int64) at /home/dfdx/.julia/v0.6/GPUArrays/src/backends/cudanative/cudanative.jl:154
[4] copy!(::Array{Float64,2}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:294
[5] Type at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:103 [inlined]
[6] Type at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:96 [inlined]
[7] show(::IOContext{Base.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at /home/dfdx/.julia/v0.6/GPUArrays/src/abstractarray.jl:48
[8] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at ./REPL.jl:122
[9] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at ./REPL.jl:125
[10] display(::GPUArrays.GPUArray{Float64,2,CUDAdrv.CuArray{Float64,2},GPUArrays.CUBackend.CUContext}) at ./multimedia.jl:194
[11] eval(::Module, ::Any) at ./boot.jl:235
[12] print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:144
[13] print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:129
[14] (::Base.REPL.#do_respond#16{Bool,Base.REPL.##26#36{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:646
Works fine on OpenCL backend.
In CUDAnative backend, this package searches for a specific Julia source code file (library.jl) which is expected to be under CUDAdrv/examples/compilation. What's that file? I was getting an error due to this issue, but I think after I did run CUDAdrv tests, this error disappeared.
u0 = GPUArray(rand(Float32, 32, 32))
k1 = similar(u0)
uprev = similar(u0)
muladd.(2,k1,uprev)
MethodError: no method matching clintrinsic(::Tuple{Expr,DataType})
Closest candidates are:
clintrinsic(!Matched::Base.#getindex, !Matched::Type{Tuple{Tuple{Vararg{T,N}},I<:Integer}}) where {N, T, I<:Integer} at C:\Users\Chris\.julia\v0.6\Transpiler\src\clike/opencl\intrinsics.jl:172
clintrinsic(!Matched::Base.#getindex, !Matched::Type{Tuple{T,I<:Union{Int32, Int64, UInt64}}}) where {T, I<:Union{Int32, Int64, UInt64}} at C:\Users\Chris\.julia\v0.6\Transpiler\src\clike/opencl\intrinsics.jl:178
clintrinsic(!Matched::Base.#setindex!, !Matched::Type{Tuple{T<:Union{Transpiler.CLTranspiler.CLIntrinsics.CLArray, Transpiler.CLTranspiler.CLIntrinsics.LocalMemory},Val,I<:Integer}}) where {T<:Union{Transpiler.CLTranspiler.CLIntrinsics.CLArray, Transpiler.CLTranspiler.CLIntrinsics.LocalMemory}, Val, I<:Integer} at C:\Users\Chris\.julia\v0.6\Transpiler\src\clike/opencl\intrinsics.jl:183
...
isintrinsic(::Sugar.LazyMethod{:CL}) at intrinsics.jl:157
_dependencies!(::DataStructures.OrderedSet{Sugar.LazyMethod}, ::Sugar.LazyMethod{:CL}) at methods.jl:330
_dependencies!(::Sugar.LazyMethod{:CL}, ::Sugar.LazyMethod{:CL}) at methods.jl:323
_dependencies!(::DataStructures.OrderedSet{Sugar.LazyMethod}, ::Sugar.LazyMethod{:CL}) at methods.jl:331
_dependencies!(::Sugar.LazyMethod{:CL}, ::Sugar.LazyMethod{:CL}) at methods.jl:323
_dependencies!(::DataStructures.OrderedSet{Sugar.LazyMethod}, ::Sugar.LazyMethod{:CL}) at methods.jl:331
_dependencies!(::Sugar.LazyMethod{:CL}, ::Sugar.LazyMethod{:CL}) at methods.jl:323
_dependencies!(::DataStructures.OrderedSet{Sugar.LazyMethod}, ::Sugar.LazyMethod{:CL}) at methods.jl:331
dependencies!(::Sugar.LazyMethod{:CL}, ::Bool) at methods.jl:310
(::Transpiler.CLTranspiler.##27#28{Tuple{GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext},##13#14,Tuple{Int32,Int32},GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}},GPUArrays.CLBackend.#broadcast_kernel,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{5,DataType}})() at compilation.jl:84
get!(::Transpiler.CLTranspiler.##27#28{Tuple{GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext},##13#14,Tuple{Int32,Int32},GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}},GPUArrays.CLBackend.#broadcast_kernel,OpenCL.cl.CmdQueue,OpenCL.cl.Context,NTuple{5,DataType}}, ::Dict{Any,Transpiler.CLTranspiler.CLFunction}, ::Tuple{GPUArrays.CLBackend.#broadcast_kernel,NTuple{5,DataType}}) at dict.jl:449
Transpiler.CLTranspiler.CLFunction(::Function, ::Tuple{GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext},##13#14,Tuple{Int32,Int32},GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}}, ::OpenCL.cl.CmdQueue) at compilation.jl:79
acc_broadcast!(::##13#14, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::Tuple{GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext},GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}}) at opencl.jl:168
broadcast(::Function, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}) at abstractarray.jl:196
include_string(::String, ::String) at loading.jl:515
eval(::Module, ::Any) at boot.jl:235
(::Atom.##61#64)() at eval.jl:102
withpath(::Atom.##61#64, ::Void) at utils.jl:30
withpath(::Function, ::Void) at eval.jl:38
macro expansion at eval.jl:101 [inlined]
(::Atom.##60#63{Dict{String,Any}})() at task.jl:80
Would a simple fallback would be fine?
uprev .+ 2.*k1
That works. If it can FMA, that's better.
u0 = GPUArray(rand(Float32, 32, 32))
tmp = ones(u0)
uprev = ones(u0)
k1 = ones(u0)
a = Float32(2.0)
tmp .= uprev.+a.*k1
indexing not defined for GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}
getindex at abstractarray.jl:874 [inlined]
_getindex at abstractarray.jl:921 [inlined]
getindex at abstractarray.jl:875 [inlined]
_broadcast_getindex at broadcast.jl:133 [inlined]
_broadcast_getindex at broadcast.jl:130 [inlined]
macro expansion at broadcast.jl:151 [inlined]
macro expansion at simdloop.jl:73 [inlined]
macro expansion at broadcast.jl:147 [inlined]
_broadcast!(::##5#6, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::Tuple{Tuple{Bool,Bool},Tuple{},Tuple{Bool,Bool}}, ::Tuple{Tuple{Int64,Int64},Tuple{},Tuple{Int64,Int64}}, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::Tuple{Float32,GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}}, ::Type{Val{2}}, ::CartesianRange{CartesianIndex{2}}) at broadcast.jl:139
broadcast_c! at broadcast.jl:211 [inlined]
broadcast!(::Function, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}, ::Float32, ::GPUArrays.GPUArray{Float32,2,OpenCL.cl.Buffer{Float32},GPUArrays.CLBackend.CLContext}) at broadcast.jl:204
include_string(::String, ::String) at loading.jl:515
include_string(::String, ::String, ::Int64) at eval.jl:30
include_string(::Module, ::String, ::String, ::Int64, ::Vararg{Int64,N} where N) at eval.jl:34
(::Atom.##49#52{String,Int64,String})() at eval.jl:50
withpath(::Atom.##49#52{String,Int64,String}, ::String) at utils.jl:30
withpath(::Function, ::String) at eval.jl:38
macro expansion at eval.jl:49 [inlined]
(::Atom.##48#51{Dict{String,Any}})() at task.jl:80
I'm interested to know why CLContext is 10 times slower than CUContext on the same GeForce GTX 950.
N-dimensional broadcast operations are buggy with CUDAnative backend. Example code snippet,
julia> using GPUArrays
julia> CUBackend.init()
CUContext
julia> g1 = GPUArray(rand(4,5,3));
julia> g2 = GPUArray(rand(1,5,3));
julia> a1 = Array(g1); a2 = Array(g2);
julia> isapprox(Array(g1 .+ g2), a1 .+ a2)
false
julia> g3 = GPUArray(rand(1,5,1)); a3 = Array(g3);
julia> isapprox(Array(g1 .+ g3), a1 .+ a3)
false
GPUArrays seems to have the same problem that occurred here: JuliaGPU/CuArrays.jl/issues/30
Since indexing is a slow operation, it would be nice to make it throw a warning when it's used so we can know when we're "safe".
Hello
I 'ved tried your black scholes script on a mac and I get the following results. It seems to work but an ERROR message is printed. Do you have the same behaviour?
INFO: Running benchmarks number of threads: 1
| Backend | Time (μs) for N = 10^1 |
| ---- | ---- |
| JLContext Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz with 1 threads | 0.00 μs|
| CLContext: AMD Radeon HD - FirePro D300 Compute Engine | 0.00 μs|
| Backend | Time (μs) for N = 10^2 |
| ---- | ---- |
| JLContext Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz with 1 threads | 0.00 μs|
| CLContext: AMD Radeon HD - FirePro D300 Compute Engine | 0.00 μs|
ERROR (unhandled task failure): MethodError: no method matching unsafe_string(::Ptr{Void})
Closest candidates are:
unsafe_string(!Matched::Cstring) at c.jl:79
unsafe_string(!Matched::Union{Ptr{Int8}, Ptr{UInt8}}) at strings/string.jl:39
unsafe_string(!Matched::Union{Ptr{Int8}, Ptr{UInt8}}, !Matched::Integer) at strings/string.jl:35
Stacktrace:
[1] macro expansion at /Users/macpro/.julia/v0.6/OpenCL/src/context.jl:95 [inlined]
[2] (::OpenCL.cl.##43#44)() at ./task.jl:335
| Backend | Time (μs) for N = 10^3 |
| ---- | ---- |
| JLContext Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz with 1 threads | 0.00 μs|
| CLContext: AMD Radeon HD - FirePro D300 Compute Engine | 0.00 μs|
| Backend | Time (μs) for N = 10^4 |
| ---- | ---- |
| JLContext Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz with 1 threads | 0.00 μs|
| CLContext: AMD Radeon HD - FirePro D300 Compute Engine | 0.00 μs|
| Backend | Time (μs) for N = 10^5 |
| ---- | ---- |
| JLContext Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz with 1 threads | 0.01 μs|
| CLContext: AMD Radeon HD - FirePro D300 Compute Engine | 0.00 μs|
| Backend | Time (μs) for N = 10^6 |
| ---- | ---- |
| JLContext Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz with 1 threads | 0.09 μs|
| CLContext: AMD Radeon HD - FirePro D300 Compute Engine | 0.00 μs|
| Backend | Time (μs) for N = 10^7 |
| ---- | ---- |
| JLContext Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz with 1 threads | 0.92 μs|
| CLContext: AMD Radeon HD - FirePro D300 Compute Engine | 0.04 μs|
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.