juliagpu / cuarrays.jl Goto Github PK
View Code? Open in Web Editor NEWA Curious Cumulation of CUDA Cuisine
Home Page: https://juliagpu.org/cuda/
License: Other
A Curious Cumulation of CUDA Cuisine
Home Page: https://juliagpu.org/cuda/
License: Other
CuArrays has accumulate!
, but it's limited: does not support anything but vectors, does not support the init keyword, and is slow (should use the shmem/shfl optims from https://github.com/JuliaGPU/CUDAnative.jl/blob/master/examples/scan.jl)
Old post:
@dpsanders and I just run into the situation where we wanted to do a cumsum on a CuArray.
CUDAnative has it as a example, but we should probably add the functionality to CuArrays https://github.com/JuliaGPU/CUDAnative.jl/blob/master/examples/scan.jl
this is a huge barrier to adoption. will building julia from source always be a requirement for this package?
CuArrays
downgrades CUDAapi to 0.2.1, CUDAdrv to 0.6.1, and CUDAnative to 0.5.3, just like CUDNN
does, which causes problem of tests in CUDAnative
. JuliaGPU/CUDAnative.jl#144, JuliaAttic/CUDNN.jl#13.
If I define a module like this:
module M
using CuArrays
# using CUDAnative
function cross_entropy_loss(ŷ::AbstractMatrix, y::AbstractMatrix)
sublosses = -sum(y .* ŷ, 1) .+ CUDAnative.log.(sum(exp.(ŷ), 1))
return mean(sublosses)
end
end
and then invoke it form REPL or another file like this:
using CuArrays
function main()
y = CuArray(rand(Float32, 3, 4))
ŷ = CuArray(rand(Float32, 3, 4))
M.cross_entropy_loss(ŷ, y)
end
I'm getting:
ERROR: Broadcast output type Any is not concrete
Stacktrace:
[1] broadcast_t at /home/dfdx/.julia/v0.6/CuArrays/src/broadcast.jl:34 [inlined]
[2] broadcast_c at /home/dfdx/.julia/v0.6/CuArrays/src/broadcast.jl:63 [inlined]
[3] broadcast at ./broadcast.jl:434 [inlined]
[4] cross_entropy_loss(::CuArray{Float32,2}, ::CuArray{Float32,2}) at /home/dfdx/Downloads/cross_entropy_test.jl:7
[5] main() at /home/dfdx/Downloads/cross_entropy_test.jl:18
@code_warntype
shows that the function indeed returns Any
, but the generated code is too complex for me to infer further details.
If we uncomment using CUDAnative
inside the module, the error disappears. It also disappears if I define cross_entropy_loss
in the same module as calling code or if I try to simplify the function. So I won't be surprised if it's not reproducible even in slightly different conditions.
I'm using Julia 0.6.0 and latest master of both - CuArrays (67444add60fee8cfcd346ab7c97dd64bfae4b1ba
) and CUDAnative (997142a0281034e96d53132da631d01e6646ca6b
).
I'm having some problems with A_mul_Bc! For example:
julia> a0 = cu(rand(1,1) + im*rand(1,1));
julia> ar = cu(rand(1,1) + im*rand(1,1));
julia> en = cu(rand(1,1) + im*rand(1,1));
julia> A_mul_Bc!(en, ar, a0)
ERROR: ReadOnlyMemoryError()
For larger matrix sizes julia seg faults in some cases. At the same time, other types of A_mul_B seem to work fine.
I get the following error during precompile on an 8 GPU machine where devices 0 and 3 are fully occupied:
ERROR: LoadError: InitError: CUDA error: out of memory (code #2, ERROR_OUT_OF_MEMORY)
Stacktrace:
[1] macro expansion at /dev/shm/dyuret/.julia/v0.6/CUDAdrv/src/base.jl:148 [inlined]
[2] CUDAdrv.CuContext(::CUDAdrv.CuDevice, ::CUDAdrv.CUctx_flags) at /dev/shm/dyuret/.julia/v0.6/CUDAdrv/src/context.jl:11\
8
[3] __init__() at /dev/shm/dyuret/.julia/v0.6/CUDAnative/src/CUDAnative.jl:67
Knet always finds the device with the greatest amount of available memory and by default initializes there. Is there a way to do this with CuArrays manually or automatically?
HI All,
I just installed CuArrays on top of CUDAdrv. I found "@Everywhere using CuArrays" automatically creates GPU contexts, one worker each on device 0. see an example when using 5 workers below.
GPU_ID %GPU GPU_MEM PID
0 0 196.8MiB 169014
0 0 197.8MiB 169021
0 0 196.8MiB 169023
0 0 196.8MiB 169027
0 0 196.8MiB 169025
1 0 0
Do I have control over which device a worker initiates the context on? I think it would be nice that the context can be created in a more controllable way, for example, only when do CUDAdrv.CuContext(CUDAdrv.CuDevice(devInt)). Ideally, it will be great that the following code can create the context only when explicitly called CUDAdrv.CuContext.
manager = MPIManager(np=4)
cpus = addprocs(manager)
gpus = [0,0,1,1]
@everywhere CuArrays
for worker in cpus
@spawnat worker CUDAdrv.CuContext(CUDAdrv.CuDevice(gpus[worker-1]))
end
@parallel (+) for i in 1:4 (xl=cu(rand(10^4,10^4));xr=cu(rand(10^4,10^4)); x=xl*xr;collect(x)) end
Cheers
Yue
CuArrays
does not cover deconv (transposed convolution) yet, which is essential in doing matrix upsampling tasks.
Yet another BLAS tolerance test failure:
elty = Float32: Test Failed
Expression: ≈(C[:L], dL, rtol=0.01)
Stacktrace:
[1] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:1173 [inlined]
[2] macro expansion at ./test.jl:921 [inlined]
[3] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:1150 [inlined]
[4] macro expansion at ./test.jl:860 [inlined]
[5] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:1149 [inlined]
[6] macro expansion at ./test.jl:860 [inlined]
[7] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:37 [inlined]
[8] macro expansion at ./test.jl:860 [inlined]
[9] anonymous at ./<missing>:?
elty = Float32: Test Failed
Expression: ≈(C[:U], dU, rtol=0.01)
Stacktrace:
[1] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:1174 [inlined]
[2] macro expansion at ./test.jl:921 [inlined]
[3] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:1150 [inlined]
[4] macro expansion at ./test.jl:860 [inlined]
[5] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:1149 [inlined]
[6] macro expansion at ./test.jl:860 [inlined]
[7] macro expansion at /var/lib/buildbot/workers/julia/CuArrays-julia06-x86-64bit/packages/v0.6/CuArrays/test/blas.jl:37 [inlined]
[8] macro expansion at ./test.jl:860 [inlined]
[9] anonymous at ./<missing>:?
However, bumping the rtol
beyond 1% seems awfully high. Something else going on?
With your comments on the blogpost, I tried to use CuArrays with ForwardDiff but it seems like I've hit a wall with the broadcast mechanisms:
using CuArrays
import CUDAnative
import CUDAdrv: synchronize
# seems missing in CuArrays?
Base.Broadcast.promote_containertype(::Type{CuArray}, ::Type{CuArray}) = CuArray
Base.Broadcast.promote_containertype(::Type{CuArray}, ct) = CuArray
Base.Broadcast.promote_containertype(ct, ::Type{CuArray}) = CuArray
# I couldn't get CuArrays to work with Base intrinsics, shouldn't the cufunc hack solve that?
# HACK: @define_diffrule cannot handle CUDAnative.x
@inline cuda_log10(x) = CUDAnative.log10(x)
@inline cuda_erf(x) = CUDAnative.erf(x)
@inline cuda_sqrt(x) = CUDAnative.sqrt(x)
@inline cuda_exp(x) = CUDAnative.exp(x)
# HACK: diff rules for CUDAnative intrinsics
import DiffBase: @define_diffrule, DiffRule # HACK: @define_diffrule wrongly escapes
@define_diffrule cuda_log10(x) = :( inv($x) / CUDAnative.log(10) )
@define_diffrule cuda_erf(x) = :( (2 / CUDAnative.sqrt(π)) * CUDAnative.exp(-$x * $x) )
@define_diffrule cuda_sqrt(x) = :( inv(2 * CUDAnative.sqrt($x)) )
@define_diffrule cuda_exp(x) = :( CUDAnative.exp($x) )
@inline cndf2(in::AbstractArray{T}) where {T<:Real} = T(0.5) .+ T(0.5) .* cuda_erf.(T(0.707106781) .* in)
function blackscholes(sptprice::AbstractArray{<:Real}, strike::AbstractArray{<:Real},
rate::AbstractArray{<:Real}, volatility::AbstractArray{<:Real},
time::AbstractArray{<:Real})
logterm = cuda_log10.(sptprice ./ strike)
powterm = eltype(volatility)(.5) .* volatility .* volatility
den = volatility .* cuda_sqrt.(time)
d1 = (((rate .+ powterm) .* time) .+ logterm) ./ den
d2 = d1 .- den
NofXd1 = cndf2(d1)
NofXd2 = cndf2(d2)
futureValue = strike .* cuda_exp.(- rate .* time)
c1 = futureValue .* NofXd2
call = sptprice .* NofXd1 .- c1
return call .- futureValue .+ sptprice
end
iterations = 10#^7
sptprice = Float32[ 42.0 for i = 1:iterations ]
strike = Float32[ 40.0 + (i / iterations) for i = 1:iterations ]
rate = Float32[ 0.5 for i = 1:iterations ]
volatility = Float32[ 0.2 for i = 1:iterations ]
time = Float32[ 0.5 for i = 1:iterations ]
sptprice_dev = CuArray(sptprice)
strike_dev = CuArray(strike)
rate_dev = CuArray(rate)
volatility_dev = CuArray(volatility)
time_dev = CuArray(time)
out = zeros(sptprice)
using ForwardDiff
blackscholes_time(time) = blackscholes(sptprice_dev, strike_dev, rate_dev, volatility_dev, time)
g = time -> ForwardDiff.gradient(blackscholes_time, time)
@show g(time_dev)
This doesn't work because of CuArrays' broadcast refusing Any:
f = #5
A = CuArray(Float32[0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2])
Bs = (CuArray(ForwardDiff.Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65},Float32,10}[Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0), Dual{ForwardDiff.Tag{#blackscholes_time,0xd4f33f37fa3a8c65}}(0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0)]),)
T = _broadcast_eltype(f, A, Bs...) = Any
ERROR: LoadError: Broadcast output type Any is not concrete
When I was running a Flux model, I encountered an error regarding something about there not being a canonical binary representation of some data I had called gpu
on, which caused the program to abort. That seemed fine and dandy because I could fix the error and re-run, but now any time I try to import CuArrays with using CuArrays
(e.g., with julia -e "using CuArrays"
from Bash), I get the following error message:
*** Error in `/home/maetshju/julia-0.6.2/julia': double free or corruption (!prev): 0x0000560346920010 ***
signal (6): Aborted
while loading no file, in expression starting on line 0
__libc_signal_restore_set at /build/glibc-itYbWN/glibc-2.26/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80 [inlined]
raise at /build/glibc-itYbWN/glibc-2.26/signal/../sysdeps/unix/sysv/linux/raise.c:48
abort at /build/glibc-itYbWN/glibc-2.26/stdlib/abort.c:90
__libc_message at /build/glibc-itYbWN/glibc-2.26/libio/../sysdeps/posix/libc_fatal.c:181
malloc_printerr at /build/glibc-itYbWN/glibc-2.26/malloc/malloc.c:5426
_int_free at /build/glibc-itYbWN/glibc-2.26/malloc/malloc.c:4175
__libc_free at /build/glibc-itYbWN/glibc-2.26/malloc/malloc.c:3145
unknown function (ip: 0x7fd3b8477d7b)
unknown function (ip: 0x7fd3b8477dc2)
unknown function (ip: 0x7fd3b8478063)
unknown function (ip: 0x7fd3b836a92f)
unknown function (ip: 0x7fd3b8344abb)
cuInit at /usr/lib/x86_64-linux-gnu/libcuda.so.390.30 (unknown line)
macro expansion at /home/maetshju/.julia/v0.6/CUDAdrv/src/base.jl:143 [inlined]
init at /home/maetshju/.julia/v0.6/CUDAdrv/src/init.jl:10
__init__ at /home/maetshju/.julia/v0.6/CUDAdrv/src/init.jl:29
unknown function (ip: 0x7fd3bd2360ff)
signal (11): Segmentation fault
while loading no file, in expression starting on line 0
I'm not quite sure how to give code to reproduce the error, unless you want to run my current script for the neural network model I'm developing and hope that it fails in such a way as to get CuArrays stuck like this, but I'm happy to provide any extra information I can.
It seems permutedims is not correctly permuting CuArrays in some cases. For example:
julia> e = cu(rand(2, 2, 2))
2×2×2 CuArray{Float64,3}:
[:, :, 1] =
0.223654 0.428071
0.498901 0.423364
[:, :, 2] =
0.257477 0.138125
0.612776 0.565442
julia> permutedims(e, (3, 1, 2))
2×2×2 CuArray{Float64,3}:
[:, :, 1] =
0.223654 0.257477
0.428071 0.138125
[:, :, 2] =
0.498901 0.612776
0.423364 0.565442
julia> permutedims(collect(e), (3, 1, 2))
2×2×2 Array{Float64,3}:
[:, :, 1] =
0.223654 0.498901
0.257477 0.612776
[:, :, 2] =
0.428071 0.423364
0.138125 0.565442
where it appears CuArrays has implemented the permutation (2, 3, 1) instead. As an aside it would be useful if CuArrays allowed the square brackets notation of the permutation [3, 1, 2], as this is what the docs of Base.permutedims suggest.
What's the best way to reduce along only one dimension? For example for sum(x,1)
?
i am using Julia 0.6.2 in Atom (GTX 1060, Cuda installed, Ryzen 5 1600X)
I know i cant use the newest Version in 0.6.2, but if i install CuArrays it will install an older Version of LLVM automatically, but i get a LoadError:
===[ ERROR: LLVM ]===
LoadError: Unknown OS
while loading C:\Users\Max.julia\v0.6\LLVM\deps\build.jl, in expression starting on line 104
Part of the Problem:
INFO: Building CUDAnative
ERROR: LoadError: ArgumentError: Module Unicode not found in current path.
Run Pkg.add("Unicode") to install the Unicode package.
if i install the knewest version of LLVM:
Pkg.test("CuArrays") gives:
WARNING: julia is fixed at 0.6.2 conflicting with requirement for LLVM: [0.7.0-DEV.2915,∞)
How can i solve this Problem without Julia 0.7?
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC)
_/ |\__'_|_|_|\__'_| |
|__/ | x86_64-linux-gnu
julia> using CuArrays
julia> CuArrays.allowslow(false)
false
julia> x = cu(abs.(randn(10, 10)))
10×10 CuArray{Float64,2}:
1.01167 0.926641 0.122783 0.433447 0.471071 … 0.641311 0.322102 1.11901 1.08687
0.174375 0.902149 0.708563 0.100777 1.17209 0.984024 1.54349 2.31564 1.42121
0.880497 0.260067 0.182906 0.321197 0.419981 1.0425 1.03248 2.39686 0.466474
0.455405 0.189683 0.970617 1.02481 1.20051 0.548943 1.0183 0.922669 0.674622
1.65143 0.423125 0.199559 0.756164 0.769089 1.65847 0.336149 0.0769364 0.728361
1.13424 0.463791 0.487644 0.422425 0.642065 … 0.582538 0.699725 0.0974422 0.725078
1.42058 0.551611 1.55393 1.81143 0.848366 2.38784 0.995038 1.15988 2.07894
0.0542946 1.25654 1.99043 1.3079 0.0102366 0.260701 1.28124 0.301421 0.744825
2.18924 0.345994 1.44563 1.96132 1.38548 1.22024 0.151539 0.120201 0.810535
1.06216 0.0061378 0.0229676 0.110931 0.036234 0.386171 0.931272 0.23151 0.688115
julia> x .^ 1.5
ERROR: LLVM error: Cannot select: 0x17890df0: f64 = fpow 0x17890990, ConstantFP:f64<1.500000e+00>
0x17890990: f64,ch = load<LD8[null(addrspace=101)]> 0x18940cc0, TargetExternalSymbol:i64'julia__7_61683_param_0', undef:i64
0x178908b0: i64 = TargetExternalSymbol'julia__7_61683_param_0'
0x17890920: i64 = undef
0x17890d80: f64 = ConstantFP<1.500000e+00>
In function: julia__7_61683
Stacktrace:
[1] handle_error(::Cstring) at /home/alha02/.julia/v0.6/LLVM/src/core/context.jl:96
[2] macro expansion at /home/alha02/.julia/v0.6/LLVM/src/util/logging.jl:102 [inlined]
[3] macro expansion at /home/alha02/.julia/v0.6/LLVM/src/base.jl:18 [inlined]
[4] LLVMTargetMachineEmitToMemoryBuffer(::Ptr{LLVM.API.LLVMOpaqueTargetMachine}, ::Ptr{LLVM.API.LLVMOpaqueModule}, ::UInt32, ::Base.RefValue{Cstring}, ::Base.RefValue{Ptr{LLVM.API.LLVMOpaqueMemoryBuffer}}) at /home/alha02/.julia/v0.6/LLVM/src/../lib/3.9/libLLVM_h.jl:301
[5] emit(::LLVM.TargetMachine, ::LLVM.Module, ::UInt32) at /home/alha02/.julia/v0.6/LLVM/src/targetmachine.jl:39
[6] #mcgen#45(::Bool, ::Function, ::LLVM.Module, ::LLVM.Function, ::VersionNumber) at /home/alha02/.julia/v0.6/CUDAnative/src/jit.jl:303
[7] (::CUDAnative.#kw##mcgen)(::Array{Any,1}, ::CUDAnative.#mcgen, ::LLVM.Module, ::LLVM.Function, ::VersionNumber) at ./<missing>:0
[8] #compile_function#46(::Bool, ::Function, ::Any, ::Any, ::VersionNumber) at /home/alha02/.julia/v0.6/CUDAnative/src/jit.jl:328
[9] cufunction(::CUDAdrv.CuDevice, ::Any, ::Any) at /home/alha02/.julia/v0.6/CUDAnative/src/jit.jl:369
[10] macro expansion at /home/alha02/.julia/v0.6/CUDAnative/src/execution.jl:107 [inlined]
[11] _cuda(::Tuple{Int64,Int64}, ::Int64, ::CUDAdrv.CuStream, ::CuArrays.#broadcast_kernel, ::##7#8, ::CUDAnative.CuDeviceArray{Float64,2,CUDAnative.AS.Global}, ::Tuple{Tuple{Bool,Bool}}, ::Tuple{Tuple{Int64,Int64}}, ::CUDAnative.CuDeviceArray{Float64,2,CUDAnative.AS.Global}, ::Tuple{}) at /home/alha02/.julia/v0.6/CUDAnative/src/execution.jl:80
[12] _broadcast! at /home/alha02/.julia/v0.6/CuArrays/src/broadcast.jl:22 [inlined]
[13] broadcast_t at /home/alha02/.julia/v0.6/CuArrays/src/broadcast.jl:37 [inlined]
[14] broadcast_c at /home/alha02/.julia/v0.6/CuArrays/src/broadcast.jl:58 [inlined]
[15] broadcast(::Function, ::CuArray{Float64,2}) at ./broadcast.jl:434
Bringing the array to host first works:
julia> collect(x) .^ 1.5
10×10 Array{Float64,2}:
1.01756 0.892005 0.0430235 0.285367 … 0.513574 0.182806 1.18373 1.13309
0.0728158 0.856875 0.596441 0.0319922 0.976132 1.9176 3.52376 1.69429
0.826213 0.132626 0.078224 0.182036 1.06443 1.04911 3.71077 0.318597
0.307324 0.0826121 0.95625 1.03744 0.406716 1.02758 0.886275 0.554103
2.12221 0.275235 0.089147 0.657543 2.1358 0.194894 0.0213402 0.621613
1.20798 0.315852 0.340529 0.274552 … 0.444618 0.585317 0.0304173 0.617415
1.69316 0.409684 1.93708 2.43799 3.68983 0.992566 1.24916 2.99752
0.0126513 1.40853 2.80814 1.49575 0.133111 1.45026 0.165486 0.642808
3.23921 0.203518 1.73815 2.74676 1.34793 0.0589909 0.0416736 0.729722
1.09467 0.00048086 0.00348075 0.0369472 0.239977 0.898701 0.111392 0.570809
and integer powers works:
julia> x .^ 2.0
10×10 CuArray{Float64,2}:
1.02349 0.858663 0.0150756 0.187876 … 0.41128 0.10375 1.25218 1.18129
0.0304066 0.813873 0.502062 0.010156 0.968303 2.38237 5.36219 2.01984
0.775276 0.067635 0.0334544 0.103167 1.08681 1.06602 5.74494 0.217598
0.207394 0.0359798 0.942097 1.05023 0.301338 1.03694 0.851318 0.455115
2.72721 0.179035 0.0398237 0.571784 2.75051 0.112996 0.00591921 0.53051
1.28651 0.215102 0.237797 0.178443 … 0.339351 0.489616 0.00949498 0.525738
2.01804 0.304275 2.4147 3.28127 5.70176 0.9901 1.34532 4.32198
0.0029479 1.5789 3.9618 1.71059 0.0679649 1.64158 0.0908546 0.554764
4.79275 0.119712 2.08985 3.84676 1.48899 0.022964 0.0144482 0.656967
1.12818 3.76726e-5 0.000527509 0.0123058 0.149128 0.867268 0.0535968 0.473502
Maybe related to #72 but the behaviour is different. There is no error, but the result of cat(3,cu(x),cu(y))
is different from cat(3,x,y)
:
julia> using CuArrays;
julia> x = rand(2, 1);
julia> y = rand(2, 1);
julia> cat(3,x,y)
2×1×2 Array{Float64,3}:
[:, :, 1] =
0.151418
0.388829
[:, :, 2] =
0.732151
0.991128
julia> cat(3,cu(x),cu(y))
2×1×2 CuArray{Float32,3}:
[:, :, 1] =
0.151418
0.388829
[:, :, 2] =
-1.25053f-6
3.89817f-6
After checkout CuArrays
, I ran into some test errors.
julia> Pkg.checkout("CuArrays")
INFO: Checking out CuArrays master...
INFO: Pulling CuArrays latest master...
INFO: Cloning cache of Adapt from https://github.com/MikeInnes/Adapt.jl.git
INFO: Installing Adapt v0.1.0
INFO: Upgrading CUDAapi: v0.2.1 => v0.3.0
INFO: Upgrading CUDAdrv: v0.6.1 => v0.7.3
INFO: Upgrading CUDAnative: v0.5.3 => v0.5.4
INFO: Building CUDAdrv
WARNING: Found multiple CUDA driver installations: /usr/lib/x86_64-linux-gnu and /usr
INFO: Building LLVM
INFO: LLVM.jl has already been built for this toolchain, no need to rebuild
INFO: Building CUDAnative
WARNING: Found multiple CUDA toolkit installations: /usr/local/cuda and /usr/local/cuda-8.0
julia> Pkg.test("CuArrays")
INFO: Testing CuArrays
ERROR: LoadError: UndefVarError: configured not defined
Stacktrace:
[1] include_from_node1(::String) at ./loading.jl:576
[2] include(::String) at ./sysimg.jl:14
[3] anonymous at ./<missing>:2
while loading /home/zhuj6/.julia/v0.6/CuArrays/src/CuArrays.jl, in expression starting on line 13
ERROR: LoadError: Failed to precompile CuArrays to /home/zhuj6/.julia/lib/v0.6/CuArrays.ji.
Stacktrace:
[1] compilecache(::String) at ./loading.jl:710
[2] _require(::Symbol) at ./loading.jl:463
[3] require(::Symbol) at ./loading.jl:405
[4] include_from_node1(::String) at ./loading.jl:576
[5] include(::String) at ./sysimg.jl:14
[6] process_options(::Base.JLOptions) at ./client.jl:305
[7] _start() at ./client.jl:371
while loading /home/zhuj6/.julia/v0.6/CuArrays/test/runtests.jl, in expression starting on line 15
ERROR: LoadError: failed process: Process(`/home/zhuj6/julia/usr/bin/julia -Cnative -J/home/zhuj6/julia/usr/lib/julia/sys.so --compile=yes --depwarn=yes --color=yes --compilecache=yes --startup-file=yes --code-coverage=none /home/zhuj6/.julia/v0.6/CuArrays/test/runtests.jl`, ProcessExited(1)) [1]
Stacktrace:
[1] pipeline_error(::Base.Process) at ./process.jl:682
[2] run(::Cmd) at ./process.jl:651
[3] include_from_node1(::String) at ./loading.jl:576
[4] include(::String) at ./sysimg.jl:14
[5] process_options(::Base.JLOptions) at ./client.jl:305
[6] _start() at ./client.jl:371
while loading /home/zhuj6/.julia/v0.6/CuArrays/test/runtests.jl, in expression starting on line 4
==============================[ ERROR: CuArrays ]===============================
failed process: Process(`/home/zhuj6/julia/usr/bin/julia -Cnative -J/home/zhuj6/julia/usr/lib/julia/sys.so --compile=yes --depwarn=yes --check-bounds=yes --code-coverage=none --color=yes --compilecache=yes /home/zhuj6/.julia/v0.6/CuArrays/test/runtests.jl`, ProcessExited(1)) [1]
================================================================================
ERROR: CuArrays had test errors
Hi Mike,
Thanks for such an effort! I think non-contiguous array indexing is the only missing feature in CuArray where we use in our models heavily. Example code snippet:
julia> a1 = rand(Float32, 4,5)
4×5 Array{Float32,2}:
0.698076 0.795957 0.501911 0.148559 0.416837
0.817677 0.430873 0.383991 0.443963 0.201368
0.359252 0.210537 0.277426 0.985338 0.454552
0.821997 0.0168654 0.453663 0.40859 0.908441
julia> c1 = CuArray(a1)
4×5 CuArray{Float32,2}:
0.698076 0.795957 0.501911 0.148559 0.416837
0.817677 0.430873 0.383991 0.443963 0.201368
0.359252 0.210537 0.277426 0.985338 0.454552
0.821997 0.0168654 0.453663 0.40859 0.908441
julia> a1[[1,20]]
2-element Array{Float32,1}:
0.698076
0.908441
julia> g1[[1,20]]
ERROR: MethodError: Cannot `convert` an object of type Tuple{Int64} to an object of type Array{Float64,3}
This may have arisen from a call to the constructor Array{Float64,3}(...),
since type constructors fall back to convert methods.
Stacktrace:
[1] getindex(::GPUArrays.GPUArray{Float64,3,CUDAdrv.CuArray{Float64,3},GPUArrays.CUBackend.CUContext}, ::Array{Int64,1}) at /KUFS/scratch/ikesen16/.julia/newnode/v0.6/GPUArrays/src/abstractarray.jl:398
julia> a1[[1,2],:]
2×5 Array{Float32,2}:
0.698076 0.795957 0.501911 0.148559 0.416837
0.817677 0.430873 0.383991 0.443963 0.201368
julia> c1[[1,2],:]
ERROR: don't know how to handle argument of type Array{Int64,1}
Stacktrace:
[1] cudaconvert(::Array{Int64,1}) at /KUFS/scratch/ikesen16/.julia/newnode/v0.6/CUDAnative/src/execution.jl:20
[2] broadcast(::Function, ::Tuple{Array{Int64,1},Base.Slice{Base.OneTo{Int64}}}) at ./broadcast.jl:17
[3] _unsafe_getindex!(::CuArray{Float32,2}, ::CuArray{Float32,2}, ::Array{Int64,1}, ::Base.Slice{Base.OneTo{Int64}}, ::Vararg{Base.Slice{Base.OneTo{Int64}},N} where N) at /KUFS/scratch/ikesen16/.julia/newnode/v0.6/CuArrays/src/indexing.jl:50
[4] macro expansion at ./multidimensional.jl:460 [inlined]
[5] _unsafe_getindex(::IndexLinear, ::CuArray{Float32,2}, ::Array{Int64,1}, ::Base.Slice{Base.OneTo{Int64}}) at ./multidimensional.jl:453
[6] macro expansion at ./multidimensional.jl:442 [inlined]
[7] _getindex at ./multidimensional.jl:438 [inlined]
[8] getindex(::CuArray{Float32,2}, ::Array{Int64,1}, ::Colon) at ./abstractarray.jl:882
julia> a1[1:2:end,1:2:end]
2×3 Array{Float32,2}:
0.698076 0.501911 0.416837
0.359252 0.277426 0.454552
julia> c1[1:2:end,1:2:end]
2×3 CuArray{Float32,2}:
0.698076 0.501911 0.416837
0.359252 0.277426 0.454552
I think if the last case works, the other is not hard to implement. I will try to test CuArrays with my dynamic neural net benchmark examples (for now, without indexing).
We need to wrap more blas kernels, as currently we only have matmul.
As part of JuliaGPU/CUDAdrv.jl#63, we'll need to fold CUBLAS.jl into this package.
Is there a way I can create a CuArray
of a constant?
Same as #21 but for CUDNN. See also JuliaGPU/CUDAdrv.jl#63.
I was trying to nail down why I was getting an error in Flux.crossentropy and came up with this minimal example. +
, -
, ./
, .*
, *
and exp.
works. But log.
doesn't work for me. Any ideas why I'm getting this error?
> using CuArrays
> x = CuArray([2.f0]);
> sum(exp.(x))
7.389056f0
> sum(log.(x))
ERROR: CUDA error: unspecified launch failure (code #719, ERROR_LAUNCH_FAILED)
Stacktrace:
[1] macro expansion at /home/ec2-user/.julia/v0.6/CUDAdrv/src/base.jl:148 [inlined]
[2] #download!#5(::Bool, ::Function, ::Base.RefArray{Float32,Array{Float32,1},Void}, ::CUDAdrv.Mem.Buffer, ::Int64, ::CUDAdrv.CuStream) at /home/ec2-user/.julia/v0.6/CUDAdrv/src/memory.jl:224
[3] (::CUDAdrv.Mem.#kw##download!)(::Array{Any,1}, ::CUDAdrv.Mem.#download!, ::Base.RefArray{Float32,Array{Float32,1},Void}, ::CUDAdrv.Mem.Buffer, ::Int64, ::CUDAdrv.CuStream) at ./<missing>:0
[4] #download!#8 at /home/ec2-user/.julia/v0.6/CUDAdrv/src/memory.jl:292 [inlined]
[5] download! at /home/ec2-user/.julia/v0.6/CUDAdrv/src/memory.jl:291 [inlined] (repeats 2 times)
[6] copy!(::Array{Float32,1}, ::CuArray{Float32,1}) at /home/ec2-user/.julia/v0.6/CuArrays/src/array.jl:66
[7] convert at /home/ec2-user/.julia/v0.6/GPUArrays/src/construction.jl:95 [inlined]
[8] convert at ./abstractarray.jl:839 [inlined]
[9] Type at ./sysimg.jl:77 [inlined]
[10] acc_mapreduce(::Function, ::Function, ::Float32, ::CuArray{Float32,1}, ::Tuple{}) at /home/ec2-user/.julia/v0.6/GPUArrays/src/mapreduce.jl:138
[11] sum(::CuArray{Float32,1}) at ./reduce.jl:359
versions:
> using CUDAdrv
> CuDevice(0)
CuDevice(0): Tesla K80
$ ../../julia/julia --version
julia version 0.6.2
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44
$ uname
Linux
The readme uses the example zs .= sin.(xs) .+ ys .* 2
to demo broadcasting, but then also says
When broadcasting, watch out for errors like:
julia> sin.(cos.(xs)) ERROR: CUDA error: invalid program counter (code #718, ERROR_INVALID_PC) A current limitation of CUDAnative means that you'll need to restart Julia and use CuArrays.sin, > CuArrays.cos etc in this case.
It's not clear to me why the usage of sin
is OK in the first but not in the second. It might help to expand on what exactly to watch out for in the 2nd case.
It would be nice to have a new release so that CURAND is usable in a released version of CuArrays.
Just wondering if there is an obvious way to enable QR decompositions here. My code is now much faster thanks to this package, but I still have to do QR on the CPU and hence have a bit of copying back and forth.
In trying to get ForwardDiff's main example to work, I'm trying to reduce with binary operators:
xs = rand(5)
sum(sin, xs)
Initially, it looked like CuArrays supports this:
xs = CuArray(rand(5))
sum(sin, xs)
But as it turns out, this does a CPU reduction because of (what I presume to be) a missing definition:
Base.sum(f::Base.Callable, xs::CuArray) = reduce(f, 0, xs)
But now it shows how CuArrays.reduce_grid
calls op
in a binary fashion, which obviously fails:
CUDAnative.code_warntype(CuArrays.reduce_grid,
(typeof(CUDAnative.sin), Float64,
CUDAnative.CuDeviceArray{Float64,1,CUDAnative.AS.Global},
CUDAnative.CuDeviceArray{Float64,1,CUDAnative.AS.Global},
Int32))
val::Any = (op::CUDAnative.#sin)(val::Any, (Base.pointerref)((Core.getfield)((Core.getfield)(input::CUDAnative.CuDeviceArray{Float64,1,CUDAnative.AS.Global}, :ptr)::CUDAnative.DevicePtr{Float64,CUDAnative.AS.Global}, :ptr)::Ptr{Float64}, (Base.zext_int)(Int64, i::UInt32)::Int64, 8)::Float64)::Any
Base.sum(f::Base.Callable, xs::CuArray) = reduce(f, 0, xs)
xs = CuArray(rand(5))
sum(sin, xs)
ERROR: LoadError: error compiling reduce_grid: emit_allocobj for CuArrays/src/reduction.jl:58 requires the dynamic_alloc language feature, which is disabled
When running the simple code
x = cu(Float32[1, 2, 3.0])
log.(exp.(x - maximum(x)))
I got the error as follows
warning: ignoring debug info with an invalid version (0) in
3-element CuArray{Float32,1}:
Error showing value of type CuArray{Float32,1}:
ERROR: CUDA error: an illegal memory access was encountered (code #700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] macro expansion at /home/xiucheng/.julia/v0.6/CUDAdrv/src/base.jl:148 [inlined]
[2] download(::Ptr{Float32}, ::CUDAdrv.OwnedPtr{Void}, ::Int64) at /home/xiucheng/.julia/v0.6/CUDAdrv/src/memory.jl:141
[3] copy!(::Array{Float32,1}, ::CuArray{Float32,1}) at /home/xiucheng/.julia/v0.6/CuArrays/src/array.jl:59
[4] #showarray#1(::Bool, ::Function, ::IOContext{Base.Terminals.TTYTerminal}, ::CuArray{Float32,1}, ::Bool) at /home/xiucheng/.julia/v0.6/CuArrays/src/array.jl:121
[5] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::CuArray{Float32,1}) at ./REPL.jl:122
[6] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::CuArray{Float32,1}) at ./REPL.jl:125
[7] display(::CuArray{Float32,1}) at ./multimedia.jl:218
[8] eval(::Module, ::Any) at ./boot.jl:235
[9] print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:144
[10] print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:129
[11] (::Base.REPL.#do_respond#16{Bool,Base.REPL.##26#36{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:646
But is works normally if I separate the composition computation into two steps like,
x = cu(Float32[1, 2, 3.0])
x = exp.(x - maximum(x))
log.(x)
The system and version info,
The URL of this package does not match that stored in METADATA.jl.
cc: @MikeInnes
I get the following error, after many prediction queries to flux-based LSTM. This is an error on CUDA 9.0. The error on CUDA 8.0 is similar, but it relates to garbage collection (when calling free
). Does it makes sense? Is the source of the error obvious and is it easily fixable? I could try to work on replicable example, but its pretty difficult as I am not sure exactly what triggers the error (very large code base involved). My guess is that its garbage collection.
Just to clarify a bit, this error never occurs when training LSTM. Only when doing inference with LSTM, where the SGD is being performed for a vector of parameter (unrelated to LSTM). The predictions from LSTM are used to define the loss. Prior to running this SGD on GPU, the LSTM is "untracked" with mTᵏ = Flux.mapleaves(Flux.Tracker.data, mTᵏ)
. The whole procedure runs fine on CPUs. Is this CuArrays
issue or Flux
issue? Anything I can do to further diagnose? Much appreciated!
signal (11): Segmentation fault
while loading no file, in expression starting on line 0
cfree at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
cudnnDestroyTensorDescriptor at /usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.1 (unknown line)
unknown function (ip: 0x7f2dd8194a55)
cudnnRNNBackwardData at /usr/lib/x86_64-linux-gnu/libcudnn.so.7.1.1 (unknown line)
macro expansion at /home/ubuntu/.julia/v0.6/CuArrays/src/dnn/error.jl:17 [inlined]
cudnnRNNBackwardData at /home/ubuntu/.julia/v0.6/Flux/src/cuda/cudnn.jl:189
unknown function (ip: 0x7f2dfc1c217b)
jl_call_fptr_internal at /usr/local/src/julia/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /usr/local/src/julia/src/julia_internal.h:358 [inlined]
jl_invoke at /usr/local/src/julia/src/gf.c:41
backwardData at /home/ubuntu/.julia/v0.6/Flux/src/cuda/cudnn.jl:206
unknown function (ip: 0x7f2dfc1c12e0)
jl_call_fptr_internal at /usr/local/src/julia/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /usr/local/src/julia/src/julia_internal.h:358 [inlined]
jl_apply_generic at /usr/local/src/julia/src/gf.c:1926
back_ at /home/ubuntu/.julia/v0.6/Flux/src/cuda/cudnn.jl:363
back_ at /home/ubuntu/.julia/v0.6/Flux/src/tracker/back.jl:25
unknown function (ip: 0x7f2dfc1c077d)
Hi Mike, would it be possible to add support for setindex! or append!? So e.g.
x = cu(randn(10))
x[1:5] .= cu(randn(5))
or
x = cu(randn(5))
append!(x, cu(randn(5)))
The URL of this package does not match that stored in METADATA.jl.
cc: @MikeInnes
Use case - collect both sum(x)
and sum(abs(x))
in one pass
a = randn(3, 3)
reduce((x0,x)->(x0[1]+x, x0[2]+abs(x)), (0.,0.), a)
works, but:
a = cu(a)
reduce((x0,x)->(x0[1]+x, x0[2]+abs(x)), (0.,0.), a)
ERROR: MethodError: Cannot `convert` an object of type Tuple{Float64,Float64} to an object of
type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
Stacktrace:
[1] reduce(::Function, ::Tuple{Float64,Float64}, ::CuArray{Float64,2}) at /.julia/v0
.6/CuArrays/src/reduction.jl:87
Probably harder, but would be nice to make it work along some dimension, for example to collect sum(x, 1)
and sum(abs(x), 1)
in one pass.
My Flux
code is taking 10-50x times longer to run on top GPU (Tesla V100) compared to an old CPU. There is also a Flux issue open for this, but the problem is actually with CuArrays
as demonstrated below.
The problem is in 2nd line of Flux.back
:
function back(::typeof(getindex), Δ, xs::Flux.TrackedArray, i...)
Δ′ = zeros(xs.data)
Δ′[i...] = Δ
Flux.Tracker.@back(xs, Δ′)
end
This kills performance on GPUs. Is there any temporary workaround for this? Our code relies heavy on doing something like this out = [out[i,:] for i=1:(3*nparams)]
in advance, as part of inference. So we can't avoid this type of indexing (I really tried to avoid it!!!). Any bandaid solution, anything? We only need back
to work for indices of the type out[i,:]
.
We are trying to wrap up a project that has become solely dependent on Flux at this point. We are completely stuck, as effectively our Flux code is completely unsuitable for GPUs. Its also impossible for us to finish this project with CPUs. Would it be possible to help us? Please? Any advice would be highly appreciated!
Here is an example that highlights the performance hit and the type of indexing that we need.
Please note that its much faster to copy the array to cpu, do the setindex!
, then copy back to GPU!!!. This is shown below in back_hack
. There's got to be a more performant GPU-only solution, no???
julia> using Flux
julia> using CuArrays
julia> testx = rand(2,100);
julia> x = param(testx);
julia> idx = (1,:); # (1, Colon())
julia> l = Flux.getindex(x,idx...);
julia> l2 = sum(l);
julia> @time Flux.back!(l2);
0.000010 seconds (9 allocations: 2.844 KiB)
julia> xg = param(testx) |> gpu;
julia> l = Flux.getindex(xg,idx...);
julia> l2 = sum(l);
julia> @time Flux.back!(l2);
0.044698 seconds (1.77 k allocations: 86.266 KiB)
Here is comparison to CPU version:
function back(::typeof(getindex), Δ, xs::Flux.TrackedArray, i...)
Δ′ = zeros(xs.data)
Δ′[i...] = Δ
Flux.Tracker.@back(xs, Δ′)
end
function back_hack(::typeof(getindex), Δ, xs::Flux.TrackedArray, i...)
Δ′ = zeros(xs.data|>cpu)
Δ′[i...] = (Δ|>cpu)
Flux.Tracker.@back(xs, Δ′|>gpu)
end
## directly call back
## ON CPU:
@time back(getindex, Flux.Tracker.grad(x)[1,1:100], x, idx...)
# 0.000019 seconds (11 allocations: 2.922 KiB)
## ON GPU
@time back(getindex, Flux.Tracker.grad(xg)[1,1:100], xg, idx...)
# 0.030649 seconds (1.78 k allocations: 86.656 KiB)
## even moving to CPU then back to GPU is doing better:
@time back_hack(getindex, Flux.Tracker.grad(xg)[1,1:100], xg, idx...)
# 0.000290 seconds (97 allocations: 6.297 KiB)
# Note that the entire run time is dominated by setindex!
xg = rand(2,100) |> gpu;
repl = zeros(100) |> gpu;
@time xg[1,:] = repl;
# 0.030225 seconds (1.71 k allocations: 84.547 KiB)
Just curious: I am trying to install CuArrays in Julia 0.6.2, but it requires LLVM. LLVM compiles out of box in Julia 0.7-dev and does not want to do that in Julia 0.6.2. The other way around: CuArrays does not want to build itself in Julia 0.7-dev.
Is there any possible way out of this circle? Or has someone succeeded recently with the build in 0.6.2? Thank you in advance
On julia 0.6
julia> using CuArrays
julia> x=cu(rand(3))
3-element CuArray{Float32,1}:
0.130609
0.036018
0.938617
julia> using SpecialFunctions
julia> erf.(x)
3-element CuArray{Float32,1}:
Error showing value of type CuArray{Float32,1}:
ERROR: CUDA error: unspecified launch failure (code #719, ERROR_LAUNCH_FAILED)
Stacktrace:
[1] macro expansion at /home/lucibello/.julia/v0.6/CUDAdrv/src/base.jl:148 [inlined]
[2] #download!#5(::Bool, ::Function, ::Base.RefArray{Float32,Array{Float32,1},Void}, ::CUDAdrv.Mem.Buffer, ::Int64, ::CUDAdrv.CuStream) at /home/lucibello/.julia/v0.6/CUDAdrv/src/memory.jl:224
[3] (::CUDAdrv.Mem.#kw##download!)(::Array{Any,1}, ::CUDAdrv.Mem.#download!, ::Base.RefArray{Float32,Array{Float32,1},Void}, ::CUDAdrv.Mem.Buffer, ::Int64, ::CUDAdrv.CuStream) at ./<missing>:0
[4] #download!#8 at /home/lucibello/.julia/v0.6/CUDAdrv/src/memory.jl:292 [inlined]
[5] download! at /home/lucibello/.julia/v0.6/CUDAdrv/src/memory.jl:291 [inlined] (repeats 2 times)
[6] copy!(::Array{Float32,1}, ::CuArray{Float32,1}) at /home/lucibello/.julia/v0.6/CuArrays/src/array.jl:66
[7] #showarray#3(::Bool, ::Function, ::IOContext{Base.Terminals.TTYTerminal}, ::CuArray{Float32,1}, ::Bool) at /home/lucibello/.julia/v0.6/CuArrays/src/array.jl:143
[8] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::CuArray{Float32,1}) at ./REPL.jl:122
[9] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::CuArray{Float32,1}) at ./REPL.jl:125
[10] display(::CuArray{Float32,1}) at ./multimedia.jl:194
[11] eval(::Module, ::Any) at ./boot.jl:235
[12] print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:144
[13] print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:129
[14] (::Base.REPL.#do_respond#16{Bool,Base.REPL.##26#36{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:646
julia> x
3-element CuArray{Float32,1}:
Error showing value of type CuArray{Float32,1}:
ERROR: CUDA error: unspecified launch failure (code #719, ERROR_LAUNCH_FAILED)
Stacktrace:
[1] macro expansion at /home/lucibello/.julia/v0.6/CUDAdrv/src/base.jl:148 [inlined]
[2] #download!#5(::Bool, ::Function, ::Base.RefArray{Float32,Array{Float32,1},Void}, ::CUDAdrv.Mem.Buffer, ::Int64, ::CUDAdrv.CuStream) at /home/lucibello/.julia/v0.6/CUDAdrv/src/memory.jl:224
[3] (::CUDAdrv.Mem.#kw##download!)(::Array{Any,1}, ::CUDAdrv.Mem.#download!, ::Base.RefArray{Float32,Array{Float32,1},Void}, ::CUDAdrv.Mem.Buffer, ::Int64, ::CUDAdrv.CuStream) at ./<missing>:0
[4] #download!#8 at /home/lucibello/.julia/v0.6/CUDAdrv/src/memory.jl:292 [inlined]
[5] download! at /home/lucibello/.julia/v0.6/CUDAdrv/src/memory.jl:291 [inlined] (repeats 2 times)
[6] copy!(::Array{Float32,1}, ::CuArray{Float32,1}) at /home/lucibello/.julia/v0.6/CuArrays/src/array.jl:66
[7] #showarray#3(::Bool, ::Function, ::IOContext{Base.Terminals.TTYTerminal}, ::CuArray{Float32,1}, ::Bool) at /home/lucibello/.julia/v0.6/CuArrays/src/array.jl:143
[8] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::CuArray{Float32,1}) at ./REPL.jl:122
[9] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::CuArray{Float32,1}) at ./REPL.jl:125
[10] display(::CuArray{Float32,1}) at ./multimedia.jl:194
[11] eval(::Module, ::Any) at ./boot.jl:235
[12] print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:144
[13] print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:129
[14] (::Base.REPL.#do_respond#16{Bool,Base.REPL.##26#36{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:646
using CuArrays
x = cu(rand(2))
y = cu(rand(2))
vcat(x,y)
Gives me warnings and error:
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_box_uint32", %59 = call i8** @jl_box_uint32(i32 zeroext %58), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_gc_pool_alloc", %60 = call i8** @jl_gc_pool_alloc(i8* %ptls_i8, i32 1456, i32 32), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %63 = call i8** @jl_apply_generic(i8*** %9, i32 3), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %64 = call i8** @jl_apply_generic(i8*** %12, i32 2), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %66 = call i8** @jl_apply_generic(i8*** %4, i32 4), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_getfield", %67 = call i8** @jl_f_getfield(i8** null, i8*** %7, i32 2), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_getfield", %68 = call i8** @jl_f_getfield(i8** null, i8*** %11, i32 2), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %70 = call i8** @jl_apply_generic(i8*** %5, i32 4), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_getfield", %71 = call i8** @jl_f_getfield(i8** null, i8*** %15, i32 2), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_getfield", %72 = call i8** @jl_f_getfield(i8** null, i8*** %16, i32 2), !dbg !17))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_apply_type", %78 = call i8** @jl_f_apply_type(i8** null, i8*** %8, i32 4), !dbg !23))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_box_int64", %79 = call i8** @jl_box_int64(i64 signext %0), !dbg !23))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_gc_pool_alloc", %80 = call i8** @jl_gc_pool_alloc(i8* %ptls_i8, i32 1432, i32 16), !dbg !23))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_new_structv", %86 = call i8** @jl_new_structv(i8** %78, i8*** %13, i32 3), !dbg !23))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %87 = call i8** @jl_apply_generic(i8*** %6, i32 2), !dbg !23))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_tuple", %88 = call i8** @jl_f_tuple(i8** null, i8*** %10, i32 1), !dbg !23))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_tuple", %89 = call i8** @jl_f_tuple(i8** null, i8*** %14, i32 2), !dbg !23))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_getfield", %53 = call i8** @jl_f_getfield(i8** null, i8*** %11, i32 2), !dbg !31))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_getfield", %54 = call i8** @jl_f_getfield(i8** null, i8*** %12, i32 2), !dbg !31))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_gc_pool_alloc", %55 = call i8** @jl_gc_pool_alloc(i8* %ptls_i8, i32 1480, i32 48), !dbg !32))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %58 = call i8** @jl_apply_generic(i8*** %14, i32 3), !dbg !32))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_f_getfield", %59 = call i8** @jl_f_getfield(i8** null, i8*** %13, i32 2), !dbg !32))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %60 = call i8** @jl_apply_generic(i8*** %15, i32 3), !dbg !32))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_gc_pool_alloc", %61 = call i8** @jl_gc_pool_alloc(i8* %ptls_i8, i32 1432, i32 16), !dbg !32))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_box_uint32", %65 = call i8** @jl_box_uint32(i32 zeroext %30), !dbg !32))
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_apply_generic", %66 = call i8** @jl_apply_generic(i8*** %10, i32 5), !dbg !32))
ERROR: LLVM IR generated for kernel(Int64, CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 6.1.0 is not compatible
Stacktrace:
[1] #compile_function#58(::Bool, ::Function, ::Any, ::Any, ::VersionNumber) at /home/david/.julia/v0.6/CUDAnative/src/jit.jl:422
[2] cufunction(::CUDAdrv.CuDevice, ::Any, ::Any) at /home/david/.julia/v0.6/CUDAnative/src/jit.jl:476
[3] macro expansion at /home/david/.julia/v0.6/CUDAnative/src/execution.jl:108 [inlined]
[4] _cuda(::Tuple{Int64,Int64}, ::Int64, ::CUDAdrv.CuStream, ::CuArrays.#kernel#10, ::Int64, ::CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}, ::Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at /home/david/.julia/v0.6/CUDAnative/src/execution.jl:80
[5] _cat(::Int64, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::Vararg{CuArray{Float32,1},N} where N) at /home/david/.julia/v0.6/CuArrays/src/utils.jl:95
[6] cat_t(::Int64, ::Type{T} where T, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::Vararg{CuArray{Float32,1},N} where N) at /home/david/.julia/v0.6/CuArrays/src/utils.jl:103
[7] vcat(::CuArray{Float32,1}, ::CuArray{Float32,1}) at /home/david/.julia/v0.6/CuArrays/src/utils.jl:106
I built Julia 0.6.2 from source and am running on Ubuntu 16.04. GPU is 1060 GEFORCE (compute capability 6.1.0) and I am running CUDA 8.0 with CUDNN 7.0.5
Has anyone seen this error before? Any help would be greatly appreciated! :)
Similar to #55 but this issue refers to hcat
.
julia> using CuArrays;
julia> x = cu(rand(2));
julia> y = cu(rand(2));
julia> hcat(x,y)
WARNING: Encountered incompatible LLVM IR for kernel(Int64, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 3.7.0: CUDAnative.InvalidIRError("calls the Julia runtime", ("jl_bounds_error_unboxed_int", call void @jl_bounds_error_unboxed_int(i8* %50, i8** inttoptr (i64 139624390903792 to i8**), i64 %.), !dbg !49))
ERROR: LLVM IR generated for kernel(Int64, CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at capability 3.7.0 is not compatible
Stacktrace:
[1] #compile_function#58(::Bool, ::Function, ::Any, ::Any, ::VersionNumber) at ~/.julia/v0.6/CUDAnative/src/jit.jl:434
[2] cufunction(::CUDAdrv.CuDevice, ::Any, ::Any) at ~/.julia/v0.6/CUDAnative/src/jit.jl:488
[3] macro expansion at ~/.julia/v0.6/CUDAnative/src/execution.jl:107 [inlined]
[4] _cuda(::Tuple{Int64,Int64}, ::Int64, ::CUDAdrv.CuStream, ::CuArrays.#kernel#12, ::Int64, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::Tuple{CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global},CUDAnative.CuDeviceArray{Float32,1,CUDAnative.AS.Global}}) at ~/.julia/v0.6/CUDAnative/src/execution.jl:80
[5] _cat(::Int64, ::CuArray{Float32,2}, ::CuArray{Float32,1}, ::Vararg{CuArray{Float32,1},N} where N) at ~/.julia/v0.6/CuArrays/src/utils.jl:96
[6] cat_t(::Int64, ::Type{T} where T, ::CuArray{Float32,1}, ::CuArray{Float32,1}, ::Vararg{CuArray{Float32,1},N} where N) at ~/.julia/v0.6/CuArrays/src/utils.jl:104
[7] hcat(::CuArray{Float32,1}, ::CuArray{Float32,1}) at ~/.julia/v0.6/CuArrays/src/utils.jl:108
The tag name "0.2.0" is not of the appropriate SemVer form (vX.Y.Z).
cc: @MikeInnes
Line 29 in 12269b8
Hi,
I've noticed that after a while, especially when creating a lot of small CuArrays, performance goes down by at least 5 orders of magnitude while creating a new CuArray. If I comment out the line I highlighted, none of this happens. I've also not found any instabilities or crashes happening after uncommenting this line, not even when the memory of the GPU is 99% used. Is there any reason for this gc pass to be there?
This should be pretty easy, we just need to hook into Base's serialize
/deserialize
, however that's done.
julia> x = cu(rand(1000,1000));
julia> x[1:10, [8;6]]
error in running finalizer: CUDAdrv.CuError(code=700, meta=nothing)
10×2 CuArray{Float64,2}:
Error showing value of type CuArray{Float64,2}:
ERROR: CUDA error: an illegal memory access was encountered (code #700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] macro expansion at /home/viralbshah/.julia/v0.6/CUDAdrv/src/base.jl:130 [inlined]
[2] download(::Ptr{Float64}, ::CUDAdrv.OwnedPtr{Float64}, ::Int64) at /home/viralbshah/.julia/v0.6/CUDAdrv/src/memory.jl:141
[3] copy!(::Array{Float64,2}, ::CuArray{Float64,2}) at /home/viralbshah/.julia/v0.6/CuArrays/src/array.jl:55
[4] #showarray#1(::Bool, ::Function, ::IOContext{Base.Terminals.TTYTerminal}, ::CuArray{Float64,2}, ::Bool) at /home/viralbshah/.julia/v0.6/CuArrays/src/array.jl:111
[5] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::MIME{Symbol("text/plain")}, ::CuArray{Float64,2}) at ./REPL.jl:122
[6] display(::Base.REPL.REPLDisplay{Base.REPL.LineEditREPL}, ::CuArray{Float64,2}) at ./REPL.jl:125
[7] display(::CuArray{Float64,2}) at ./multimedia.jl:194
[8] eval(::Module, ::Any) at ./boot.jl:235
[9] print_response(::Base.Terminals.TTYTerminal, ::Any, ::Void, ::Bool, ::Bool, ::Void) at ./REPL.jl:144
[10] print_response(::Base.REPL.LineEditREPL, ::Any, ::Void, ::Bool, ::Bool) at ./REPL.jl:129
[11] (::Base.REPL.#do_respond#16{Bool,Base.REPL.##26#36{Base.REPL.LineEditREPL,Base.REPL.REPLHistoryProvider},Base.REPL.LineEditREPL,Base.LineEdit.Prompt})(::Base.LineEdit.MIState, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Bool) at ./REPL.jl:646
The current pool allocator never frees memory except when allocations fail. This quickly leads to CuArrays/Flux using all of the available memory, as frequently observed on cyclops
. We should probably have an idle thread scan for unused pool entries and free those up.
Earlier I posted a similar issue to FluxML/NNlib.jl, but this one is unrelated to that repo.
The following code:
using CuArrays
A = cu(rand(Float32, 10, 100))
B = cu(ones(Float32, 10, 100))
C = cu(ones(Float32, 1, 100))
D = cu(ones(Float32, 1, 100))
E = cu(rand(Float32, 10, 100))
CUDAnative.exp.(1 ./ A) .* (B .* (C ./ D)) .+ E
generates this error:
WARNING: Method definition convert(Type{LLVM.LLVMType}, Type{T} where T) in module Interop at /home/dfdx/.julia/v0.6/LLVM/src/interop/base.jl:54 overwritten in module CUDAnative at /home/dfdx/.julia/v0.6/CUDAnative/src/cgutils.jl:159.
warning: ignoring debug info with an invalid version (0) in #1
warning: ignoring debug info with an invalid version (0) in
ERROR: LoadError: LLVM error: All DICompileUnits must be listed in llvm.dbg.cu
Stacktrace:
[1] verify(::LLVM.Module) at /home/dfdx/.julia/v0.6/LLVM/src/analysis.jl:11
[2] #add_entry!#26(::Bool, ::Function, ::LLVM.Module, ::Any, ::Any) at /home/dfdx/.julia/v0.6/CUDAnative/src/jit.jl:251
[3] (::CUDAnative.#kw##add_entry!)(::Array{Any,1}, ::CUDAnative.#add_entry!, ::LLVM.Module, ::Any, ::Any) at ./<missing>:0
[4] #compile_function#51(::Bool, ::Function, ::Any, ::Any, ::VersionNumber) at /home/dfdx/.julia/v0.6/CUDAnative/src/jit.jl:402
[5] cufunction(::CUDAdrv.CuDevice, ::Any, ::Any) at /home/dfdx/.julia/v0.6/CUDAnative/src/jit.jl:465
[6] macro expansion at /home/dfdx/.julia/v0.6/CUDAnative/src/execution.jl:108 [inlined]
[7] _cuda(::Tuple{Int64,Int64}, ::Int64, ::CUDAdrv.CuStream, ::CuArrays.#broadcast_kernel, ::##1#2, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::NTuple{5,Tuple{Bool,Bool}}, ::NTuple{5,Tuple{Int64,Int64}}, ::CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}, ::NTuple{4,CUDAnative.CuDeviceArray{Float32,2,CUDAnative.AS.Global}}) at /home/dfdx/.julia/v0.6/CUDAnative/src/execution.jl:80
[8] _broadcast! at /home/dfdx/.julia/v0.6/CuArrays/src/broadcast.jl:22 [inlined]
[9] broadcast_t(::Function, ::Type{T} where T, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}, ::CuArray{Float32,2}, ::CuArray{Float32,2}, ::CuArray{Float32,2}, ::CuArray{Float32,2}, ::CuArray{Float32,2}) at /home/dfdx/.julia/v0.6/CuArrays/src/broadcast.jl:37
[10] broadcast_c(::Function, ::Type{CuArrays.CuArray}, ::CuArray{Float32,2}, ::CuArray{Float32,2}, ::Vararg{CuArray{Float32,2},N} where N) at /home/dfdx/.julia/v0.6/CuArrays/src/broadcast.jl:58
[11] broadcast(::Function, ::CuArray{Float32,2}, ::CuArray{Float32,2}, ::CuArray{Float32,2}, ::Vararg{CuArray{Float32,2},N} where N) at ./broadcast.jl:455
[12] include_from_node1(::String) at ./loading.jl:576
[13] include(::String) at ./sysimg.jl:14
while loading /home/dfdx/Downloads/broadcast_fail.jl, in expression starting on line 10
I use Julia 0.6.2, CUDAnative 0.5.3 and latest master of CuArrays. I'm open to test it on Julia 0.7 and latest CUDAnative master if it's assumed to be fixed there, but right now this setup seems to be broken (or at least I couldn't build latest CUDAnative on my machine), so it will take time to resolve,
Currently I have to do cu(rand(...))
. It would be nice to have a native version.
This exists here but we need to figure out how to integrate it with CuArrays.
I was trying the "mnist/mlp.jl" file, which on CPU runs just fine. However, when I uncomment the using CuArrays
line I get a broadcast error on the call to crossentropy
:
julia> include("mlp.jl")
INFO: Recompiling stale cache file /home/carlo/.julia/lib/v0.6/Flux.ji for module Flux.
INFO: Recompiling stale cache file /home/carlo/.julia/lib/v0.6/CuArrays.ji for module CuArrays.
ERROR: LoadError: Broadcast output type Any is not concrete
Stacktrace:
[1] broadcast_t at /home/carlo/.julia/v0.6/CuArrays/src/broadcast.jl:34 [inlined]
[2] broadcast_c at /home/carlo/.julia/v0.6/CuArrays/src/broadcast.jl:63 [inlined]
[3] broadcast at ./broadcast.jl:455 [inlined]
[4] tracked_broadcast(::Function, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}, ::TrackedArray{…,CuArray{Float32,2}}, ::Int64) at /home/carlo/.julia/v0.6/Flux/src/tracker/array.jl:278
[5] #crossentropy#71(::Int64, ::Function, ::TrackedArray{…,CuArray{Float32,2}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/carlo/.julia/v0.6/Flux/src/layers/stateless.jl:8
[6] crossentropy(::TrackedArray{…,CuArray{Float32,2}}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/carlo/.julia/v0.6/Flux/src/layers/stateless.jl:8
[7] loss(::CuArray{Float32,2}, ::Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}) at /home/carlo/Programs/Devel/deeplearning/model-zoo/mnist/mlp.jl:21
[8] #train!#130(::Flux.#throttled#14, ::Function, ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::Flux.Optimise.##71#75) at /home/carlo/.julia/v0.6/Flux/src/optimise/train.jl:39
[9] (::Flux.Optimise.#kw##train!)(::Array{Any,1}, ::Flux.Optimise.#train!, ::Function, ::Base.Iterators.Take{Base.Iterators.Repeated{Tuple{CuArray{Float32,2},Flux.OneHotMatrix{CuArray{Flux.OneHotVector,1}}}}}, ::Function) at ./<missing>:0
[10] include_from_node1(::String) at ./loading.jl:576
[11] include(::String) at ./sysimg.jl:14
while loading /home/carlo/Programs/Devel/deeplearning/model-zoo/mnist/mlp.jl, in expression starting on line 29
I can't figure out what's the issue. I'm on Julia 0.6.3-pre and I tried to get the latest versions of all packages. Any ideas?
The URL of this package does not match that stored in METADATA.jl.
cc: @MikeInnes
permutedims(x, (2,1))
works really fast. transpose
does not. May just need to be hooked up.
It boils down to this:
a = cu(zeros(100))
b = cu(fill(1,100))
then
copy!(a,b)
throws
ERROR: MethodError: no method matching transfer!(::CUDAdrv.Mem.Buffer, ::CUDAdrv.Mem.Buffer)
From CuArrays the method for copy calls Mem.transfer from CUDAdrv's memory.jl module:
function Base.copy!(dst::CuArray{T}, src::CuArray{T}) where T
@assert length(dst) == length(src)
Mem.transfer!(unsafe_buffer(dst), unsafe_buffer(src))
return dst
end
But Mem.transfer! requires at least the number of bytes:
function transfer!(dst::Buffer, src::Buffer, nbytes::Integer,
stream::CuStream=CuDefaultStream(); async::Bool=false)
if async
@apicall(:cuMemcpyDtoDAsync,
(Ptr{Cvoid}, Ptr{Cvoid}, Csize_t, CuStream_t),
dst, src, nbytes, stream)
else
@assert stream==CuDefaultStream()
@apicall(:cuMemcpyDtoD,
(Ptr{Cvoid}, Ptr{Cvoid}, Csize_t),
dst, src, nbytes)
end
end
It also seems like this should be caught in Pkg.test("CuArrays").
Please see denizyuret/Knet.jl#198.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.