theano / libgpuarray Goto Github PK

Library to manipulate tensors on the GPU.

License: Other

CMake 1.67% Python 29.33% C 68.38% Makefile 0.12% Shell 0.31% Batchfile 0.17% C++ 0.02%

libgpuarray's Introduction

============================================================================================================
MILA has stopped developing Theano: https://groups.google.com/d/msg/theano-users/7Poq8BZutbY/rNCIfvAEAwAJ

The PyMC developers have forked Theano to a new project called PyTensor that is being actively developed: https://github.com/pymc-devs/pytensor
============================================================================================================

libgpuarray's People

Contributors

Stargazers

Watchers

Forkers

nouiz abergeron gburachas maxbareiss eyad-shami mdda mapleyustat chagge shwina seanprime7 zhmz90 csbitcoin hughperkins ejls oursland davidweichiang wtfrank nirvik-d jimfleming hsouporto simudream caomw qnix gbaydin strategist922 adler-j mankeyboy cooijmanstim trungnt13 miradel51 hitluobin tsirif kashif obilaniu huanzhang12 neutralcode hma02 slefrancois hnkulkarni gpu-poor yingted mzhang001 notoraptor lamblin mingwandroid jrao1 ar90n edrogers marccote ro-mix borisfom tfjgeorge jakirkham psmit elementai paulmenzel dendisuhubdy shawntan xiaoqie f0k vcampmany jonathanstrong ibmsoe astosyk jvesely waleedamustafa pambros andbmme hfxunlp yanzhaowu spencerx adityavs gsam reiisky bxk-sonavex rebecca-palmer dothingyo ai-awesome-repos wonghang 5l1v3r1 manojbhat09 stjordanis clayne nononowow maidenpooladmin aureagle snserhello hovinhthinh onebitbrain israelgonzalezb isabella232 phoenixdigitalfx jamesjer mariadb-sergeyzefirov cakemd

libgpuarray's Issues

Clarification of design goals

In an open source package for inverse problems (odl), we're looking for an alternative CUDA backend with N-D support. libgpuarray looks interesting, but I have some questions that need clarification:

libgpuarray is obviously tightly coupled with Theano, but is it intended to be used by other packages?
Do you intend to accept external pull requests? Bug fixes or actual features?
The documentation says: "we need a NumPy ndarray on the GPU". But in the default "GpuArray" you lack even the basics, such as addition, instead referencing users to compiling their own ElemwiseKernel because such methods would be un-optimized. Is the intent that libgpuarray should be a drop in for numpy, or is it intended to only support an optimized subset of it?
Compilation times are currently great, but the payoff is that barely any functions are pre-compiled. Do you intend to ship libgpuarray with the standard ufuncs (add, exp, etc) compiled, or should we do that in our own library?

Verify that the PTX version in extcpy is actually working on all supported cuda version

pygpu.gpuarray.GpuArrayException: Invalid value

pygpu.init can't find Radeon GPU card:

pygpu.init('opencl0:0').devname
u'Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz' // CPU ok
pygpu.init('opencl0:1').devname
Traceback (most recent call last):
File "", line 1, in
File "gpuarray.pyx", line 587, in pygpu.gpuarray.init (pygpu/gpuarray.c:7113)
File "gpuarray.pyx", line 558, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:7051)
File "gpuarray.pyx", line 962, in pygpu.gpuarray.GpuContext.cinit (pygpu/gpuarray.c:10398)
pygpu.gpuarray.GpuArrayException: Invalid value

opencl0:1 is valid in pyopencl:
pyopencl.Device 'ATI Radeon HD 6970M' on 'Apple' at 0x1021b00

pygpu.init('opencl0:2').devname
ValueError: No device 2 // OK, only 0 and 1 are valid devices

Is this card not supported or did something go wrong during installation/compilation?

Missing import for SkipTest in test_blas

Ran into this while running tests:

https://github.com/Theano/libgpuarray/blob/master/pygpu/tests/test_blas.py#L12

Python 2.7
Missing Scipy installation
Mac OS X (Yosemite)

Do something about the fact that cudaFree does the equivalent of a DeviceSync()

We are considering a number of options to fix this problem.

One of those is to implement our own malloc inside a couple of giant blocks of memory we get from the card.

The other would be to add an allocation cache that would reuse blocks of the same size (avoiding a free).

We welcome other ideas to deal with this problem if you have any.

ERROR (theano.sandbox.gpuarray): Could not initialize pygpu, support disabled

Hi,
in Ubuntu 14.04 with Nvidia GeForce GTX 770, I installed via sudo apt-get http://packages.ubuntu.com/trusty/devel/nvidia-cuda-toolkit (5.5.22-3ubuntu1)
Followed the step-by-step-guide of libgpuarray: http://deeplearning.net/software/libgpuarray/installation.html#requirements

But when testing Theano with cuda, the output is:

marco@marco-All-Series:~/Theano-Testing$ THEANO_FLAGS=device=cuda0 python check1.py
ERROR (theano.sandbox.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/gpuarray/init.py", line 44, in
init_dev(config.device)
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/gpuarray/init.py", line 36, in init_dev
context = pygpu.init(dev)
File "gpuarray.pyx", line 575, in pygpu.gpuarray.init (pygpu/gpuarray.c:7317)
File "gpuarray.pyx", line 546, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:7246)
File "gpuarray.pyx", line 950, in pygpu.gpuarray.GpuContext.cinit (pygpu/gpuarray.c:10820)
GpuArrayException: No CUDA devices avaiable
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 3.85683894157 seconds
Result is [ 1.23178032 1.61879341 1.52278065 ..., 2.20771815 2.29967753
1.62323285]
Used the cpu

Any clues to solve the problem?

Looking forward to your kind hints.
Kind regards.
Marco

Make default gpu cuda or cuda0, not opencl.

The OpenCL back-end is lagging and will continue like this for some times.

ga_double, ga_byte : Is libgpuarray the authority?

I'm asking because Theano/theano/sandbox/gpuarray/elemwise.py also includes these in its preamble, and they appear to be the CUDA versions only. (I know I should also ask in the Theano issues, but I figure I could start here...)

Also, I think that libgpuarray's list could be checked vs the equivalent in /usr/include/CL/cl_platform.h (or whatever is equivalent on your system), since OpenCL device-independent types are set out there too. eg: rather than libgpuarray defining ga_byte for OpenCL platforms independently (to char), perhaps it should define it as being int8_t (for example), or cl_char (for clarity).

Build failing on OS X 10.9

Hi,

I need some help to install libgpuarray on my mac. When I type "cmake .. -DCMAKE_BUILD_TYPE=Release", I get the following:
-- The C compiler identification is AppleClang 6.0.0.6000056
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Found CUDA: /usr/local/cuda (found version "6.5")
-- Found OpenCL: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk/System/Library/Frameworks/OpenCL.framework
-- Looking for strlcat
-- Looking for strlcat - found
-- Looking for mkstemp
-- Looking for mkstemp - found
Cuda 6.0+ does not come in universal flavor anymore so we disable the universal build.
-- Could NOT find clBLAS (missing: CLBLAS_LIBRARIES CLBLAS_INCLUDE_DIRS)
-- Found PkgConfig: /usr/local/bin/pkg-config (found version "0.28")
-- checking for one of the modules 'check'
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/dokookchoe/research/stuff/libgpuarray/Build

I think it has trouble finding "check". When I type "make", I get error messages:
Linking C executable check_buffer
ld: library not found for -lcheck
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [tests/check_buffer] Error 1
make[1]: *** [tests/CMakeFiles/check_buffer.dir/all] Error 2

For some reasons, it couldn't find "check" and failed to link it when "make" was invoked (I think). I installed "check" using "brew install check". "check" is installed in the following directories:
/usr/local/Cellar/check
/usr/local/Cellar/check/0.9.14/share/doc/check
/usr/local/Library/LinkedKegs/check
/usr/local/opt/check
/usr/local/share/doc/check

Can anyone help me out?

Thanks,
DK

Error installing libgpuarray on Mac OSX

System: MacBook Air mid 2014. Mac OSX 10.10.5

Tried to install libgpuarray from this guide: http://deeplearning.net/software/libgpuarray/installation.html

on "make" I get this error:

mycomp:Build myuser$ make
[ 4%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_buffer_opencl.c.o
In file included from /Users/brandonbrown/Desktop/Projects/libgpuarray/src/gpuarray_buffer_opencl.c:4:
/Users/brandonbrown/Desktop/Projects/libgpuarray/src/private_opencl.h:7:10: fatal error:
'OpenCL/opencl.h' file not found

include <OpenCL/opencl.h>

1 error generated.
make[2]: *** [src/CMakeFiles/gpuarray.dir/gpuarray_buffer_opencl.c.o] Error 1
make[1]: *** [src/CMakeFiles/gpuarray.dir/all] Error 2
make: *** [all] Error 2

Mac OSX comes with OpenCL drivers and the opencl.h header files are in the frameworks directory. Not sure why it's failing. Some linking issue?

Enable and use peer2peer communication

Currently, we don't enable this on the GPUS, so the code always fall back to the transfer via the CPU.

Return values of elemwise1, etc., are not cached

From #66:
I'm also wondering if you consider it an issue that ndgpuarray's __add__ and friends recompile their kernels every single time (because they create a new ElemwiseKernel/ReductionKernel object and self is part of the cache key).

The workaround I implemented was to add another layer of caching around elemwise1, elemwise2, and reduce1 that returns a cached ElemwiseKernel/ReductionKernel object if everything (context, operation, argument/result types) matches.

Another possibility would be to add __hash__ and __eq__ methods to ElemwiseKernel/ReductionKernel so that the cache can recognize when two kernel objects are equivalent. I believe that would be the simpler solution.

I'd be happy to try either of the above, or possibly something else if you have any ideas.

lfu_cache purges most-recently added items

Hi, even after patch #65, I find that lfu_cache is still causing too many kernel recompiles. The reason seems to be that when an item is ejected from the cache, its count is also reset. When an item is added, its count is therefore 1, and is liable to get ejected again right away. However, I don't know the most appropriate fix -- perhaps an LRU policy instead?

Does not build on Mac OS X 10.10

After configuring, the make command results in the following -
Scanning dependencies of target gpuarray
[ 12%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_buffer_opencl.c.o
[ 25%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_blas_opencl_clblas.c.o
[ 37%] Building C object src/CMakeFiles/gpuarray.dir/util/strb.c.o
[ 50%] Building C object src/CMakeFiles/gpuarray.dir/util/halloc.c.o
/tmp/libgpuarray/src/util/halloc.c:211:13: error: no member named 'magic' in
'struct hblock'
assert(p->magic == HH_MAGIC);
~ ^
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(builtin_expect(!(e), 0) ? __assert_rtn(__func, FILE, LINE...
^
/tmp/libgpuarray/src/util/halloc.c:211:22: error: use of undeclared identifier
'HH_MAGIC'
assert(p->magic == HH_MAGIC);
^
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func, FILE, LINE...
^
/tmp/libgpuarray/src/util/halloc.c:242:12: error: no member named 'magic' in
'struct hblock'
assert(b->magic == HH_MAGIC);
~ ^
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func, FILE, LINE...
^
/tmp/libgpuarray/src/util/halloc.c:242:21: error: use of undeclared identifier
'HH_MAGIC'
assert(b->magic == HH_MAGIC);
^
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func, FILE, LINE...
^
/tmp/libgpuarray/src/util/halloc.c:251:12: error: no member named 'magic' in
'struct hblock'
assert(p->magic == HH_MAGIC);
~ ^
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func, FILE, LINE...
^
/tmp/libgpuarray/src/util/halloc.c:251:21: error: use of undeclared identifier
'HH_MAGIC'
assert(p->magic == HH_MAGIC);
^
/usr/include/assert.h:93:25: note: expanded from macro 'assert'
(__builtin_expect(!(e), 0) ? __assert_rtn(__func, FILE, __LINE...
^
6 errors generated.
make[2]: *** [src/CMakeFiles/gpuarray.dir/util/halloc.c.o] Error 1
make[1]: *** [src/CMakeFiles/gpuarray.dir/all] Error 2
make: *** [all] Error 2

Good error when GPU selected is used and in exclusive mode

If not too hard, a better error message when the choosed GPU is busy in exclusive mode:

$CUDA_VISIBLE_DEVICES=1  DEVICE=cuda0 python -c "import pygpu;pygpu.test()"
======================================================================
ERROR: Failure: GpuArrayException (invalid device ordinal)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Tmp/lisa/os_v5/anaconda/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/Tmp/lisa/os_v5/anaconda/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Tmp/lisa/os_v5/anaconda/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/u/bastienf/.local/lib64/python2.7/site-packages/pygpu-0.2.1-py2.7-linux-x86_64.egg/pygpu/tests/test_tools.py", line 5, in <module>
    from .support import (guard_devsup, rand, check_flags, check_meta, check_all,
  File "/u/bastienf/.local/lib64/python2.7/site-packages/pygpu-0.2.1-py2.7-linux-x86_64.egg/pygpu/tests/support.py", line 34, in <module>
    context = gpuarray.init(get_env_dev())
  File "pygpu/gpuarray.pyx", line 614, in pygpu.gpuarray.init (pygpu/gpuarray.c:8505)
  File "pygpu/gpuarray.pyx", line 585, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:8434)
  File "pygpu/gpuarray.pyx", line 990, in pygpu.gpuarray.GpuContext.__cinit__ (pygpu/gpuarray.c:12239)
GpuArrayException: invalid device ordinal

----------------------------------------------------------------------

function 'strndup' not defined in Microsoft code, causing build error in Windows.

There is a single use of strndup in src/gpuarray_buffer_opencl.c, the function has not been implemented by Microsoft so the build fails on Windows.

python 3 compat for pygpu

We should test pygpu on python 3 and fix the problems.

/usr/local/lib/libgpuarray.so not found when importing pygpu on ubuntu 14.04

In order to use the instructions from http://deeplearning.net/software/libgpuarray/installation.html , I also needed to symlink the libgpuarray to somewhere in the default LD_LIBRARY_PATH

sudo ln -s /usr/local/lib/libgpuarray.so /usr/lib/libgpuarray.so

Not sure why /usr/local/lib was not good enough, nevertheless this worked.

Before I got this message in ipython3:

In [1]: import pygpu
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-b22a183e5879> in <module>()
----> 1 import pygpu

/usr/local/lib/python3.4/dist-packages/pygpu-0.2.1-py3.4-linux-x86_64.egg/pygpu/__init__.py in     <module>()
      5     return p
      6 
----> 7 from . import gpuarray, elemwise, reduction
      8 from .gpuarray import (init, set_default_context, get_default_context,
      9                        array, zeros, empty, asarray, ascontiguousarray,

ImportError: libgpuarray.so: cannot open shared object file: No such file or directory

In [2]:

Test errors with numpy 1.10

======================================================================
ERROR: pygpu.tests.test_elemwise.test_ielemwise2_ops_array(<built-in function iadd>, 'int8', 'float32', (50,))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Tmp/lisa/os_v5/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/u/bastienf/.local/lib64/python2.7/site-packages/pygpu-0.2.1-py2.7-linux-x86_64.egg/pygpu/tests/support.py", line 39, in f
    func(*args, **kwargs)
  File "/u/bastienf/.local/lib64/python2.7/site-packages/pygpu-0.2.1-py2.7-linux-x86_64.egg/pygpu/tests/test_elemwise.py", line 81, in ielemwise2_ops_array
    out_c = op(ac, bc)
TypeError: Cannot cast ufunc add output from dtype('float32') to dtype('int8') with casting rule 'same_kind'

Debug printout of non-working kernels

I've been puzzling over getting some OpenCL tests (and then Ops, I guess) working on theano.sandbox.gpuarray. But sometimes/often, I'm getting "GpuArrayException: Program build failure" errors.

Would the project be interested in a pull request that outputs something like the following?

:38:52: error: must specify '#pragma OPENCL EXTENSION cl_khr_fp64: enable' before using 'double'
KERNEL void reduk(const unsigned int n, GLOBAL_MEM ga_double * out
                                                   ^
:30:19: note: instantiated from:
#define ga_double double
                  ^

// source.section[0].length=940
0001   #define local_barrier() barrier(CLK_LOCAL_MEM_FENCE)
0002   #define WHITHIN_KERNEL /* empty */
...
0028   #define ga_ulong ulong
0029   #define ga_float float
0030   #define ga_double double
0031   #define ga_half half
0032   #define ga_size ulong
// source.section[1].length=2087
0033   
0034   
0035   
0036   #define REDUCE(a, b) (a + b)
0037   
0038   KERNEL void reduk(const unsigned int n, GLOBAL_MEM ga_double * out
0039                       , const unsigned int dim0
0040                       , const unsigned int dim1
0041                       , GLOBAL_MEM ga_float * a_data
0042                       , const unsigned int a_offset
0043                       , const int a_str_0
0044                       , const int a_str_1
0045   ) {
0046     LOCAL_MEM ga_double ldata[1024];
0047     const unsigned int lid = LID_0;
0048     unsigned int i;
0049     GLOBAL_MEM char *tmp;
0050   
...

("..." added manually for brevity)

I guess that this also highlights other issues (but one thing at a time)

extcpy cache can get overloaded

Hi, I've encountered a situation similar to the one addressed by PR #65, but at the level of the extcpy cache. Something in my code is causing extcpys of many different sizes, and the extcpy kernels are constantly getting regenerated. (They are cached in nvcc's ComputeCache, so the speed is still acceptable, but it does hit the disk pretty hard.) I haven't had time to dig into this sufficiently, but am submitting this issue in case you know of a quick fix similar to the power-of-two fix that worked for ReductionKernels. Thanks!

Can't import pygpu on Ubuntu in python after libgpuarray was installed

I installed libgpuarray by following instructions from http://deeplearning.net/software/libgpuarray/installation.html. All looks more less ok (without clBLAS):

root@xxx:~/libgpuarray/Build# cmake .. -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 4.8.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Found CUDA: /usr/local/cuda (found version "7.0")
-- Found OpenCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so
-- Looking for strlcat
-- Looking for strlcat - not found
-- Looking for mkstemp
-- Looking for mkstemp - found
Building with NVRTC
-- Looking for cublasSgemmEx
-- Looking for cublasSgemmEx - not found
-- Could NOT find clBLAS (missing: CLBLAS_LIBRARIES CLBLAS_INCLUDE_DIRS)
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.26")
-- checking for one of the modules 'check'
Tests disabled because Check was not found
-- Configuring done
CMake Warning at src/CMakeLists.txt:123 (add_library):
Cannot generate a safe runtime search path for target gpuarray because
files in some directories may conflict with libraries in implicit
directories:

runtime library [libOpenCL.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
  /usr/local/cuda/lib64

Some of these libraries may not be found correctly.

-- Generating done
-- Build files have been written to: /home/ubuntu/libgpuarray/Build

root@xxx:~/libgpuarray/Build# make
Scanning dependencies of target gpuarray
[ 3%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_types.c.o
[ 7%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_error.c.o
[ 10%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_util.c.o
[ 14%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_buffer.c.o
[ 17%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_array.c.o
[ 21%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_array_blas.c.o
[ 25%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_kernel.c.o
[ 28%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_extension.c.o
[ 32%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_strl.c.o
[ 35%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_buffer_cuda.c.o
[ 39%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_blas_cuda_cublas.c.o
[ 42%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_buffer_opencl.c.o
In file included from /usr/local/cuda/include/CL/opencl.h:44:0,
from /home/ubuntu/libgpuarray/src/private_opencl.h:9,
from /home/ubuntu/libgpuarray/src/gpuarray_buffer_opencl.c:4:
/usr/local/cuda/include/CL/cl_gl_ext.h:44:4: warning: "/*" within comment [-Wcomment]

/* cl_VEN_extname extension /
^
[ 46%] Building C object src/CMakeFiles/gpuarray.dir/util/strb.c.o
Linking C shared library ../../lib/libgpuarray.so
[ 50%] Built target gpuarray
Scanning dependencies of target gpuarray-static
[ 53%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_types.c.o
[ 57%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_error.c.o
[ 60%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_util.c.o
[ 64%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_buffer.c.o
[ 67%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_array.c.o
[ 71%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_array_blas.c.o
[ 75%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_kernel.c.o
[ 78%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_extension.c.o
[ 82%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_strl.c.o
[ 85%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_buffer_cuda.c.o
[ 89%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_blas_cuda_cublas.c.o
[ 92%] Building C object src/CMakeFiles/gpuarray-static.dir/gpuarray_buffer_opencl.c.o
In file included from /usr/local/cuda/include/CL/opencl.h:44:0,
from /home/ubuntu/libgpuarray/src/private_opencl.h:9,
from /home/ubuntu/libgpuarray/src/gpuarray_buffer_opencl.c:4:
/usr/local/cuda/include/CL/cl_gl_ext.h:44:4: warning: "/" within comment [-Wcomment]
/* cl_VEN_extname extension */
^
[ 96%] Building C object src/CMakeFiles/gpuarray-static.dir/util/strb.c.o
Linking C static library ../../lib/libgpuarray-static.a
[100%] Built target gpuarray-static

root@xxx:~/libgpuarray/Build# make install
[ 50%] Built target gpuarray
[100%] Built target gpuarray-static
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/include/gpuarray/array.h
-- Installing: /usr/local/include/gpuarray/blas.h
-- Installing: /usr/local/include/gpuarray/buffer.h
-- Installing: /usr/local/include/gpuarray/buffer_blas.h
-- Installing: /usr/local/include/gpuarray/config.h
-- Installing: /usr/local/include/gpuarray/error.h
-- Installing: /usr/local/include/gpuarray/extension.h
-- Installing: /usr/local/include/gpuarray/ext_cuda.h
-- Installing: /usr/local/include/gpuarray/kernel.h
-- Installing: /usr/local/include/gpuarray/types.h
-- Installing: /usr/local/include/gpuarray/util.h
-- Installing: /usr/local/lib/libgpuarray.so
-- Removed runtime path from "/usr/local/lib/libgpuarray.so"
-- Installing: /usr/local/lib/libgpuarray-static.a

root@ip-10-202-167-238:/libgpuarray/Build# cd ..
root@ip-10-202-167-238:/libgpuarray# python setup.py build
Compiling pygpu/gpuarray.pyx because it changed.
Compiling pygpu/blas.pyx because it changed.
[1/2] Cythonizing pygpu/blas.pyx
[2/2] Cythonizing pygpu/gpuarray.pyx
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/pygpu
copying pygpu/tools.py -> build/lib.linux-x86_64-2.7/pygpu
copying pygpu/init.py -> build/lib.linux-x86_64-2.7/pygpu
copying pygpu/elemwise.py -> build/lib.linux-x86_64-2.7/pygpu
copying pygpu/parser.py -> build/lib.linux-x86_64-2.7/pygpu
copying pygpu/reduction.py -> build/lib.linux-x86_64-2.7/pygpu
copying pygpu/_array.py -> build/lib.linux-x86_64-2.7/pygpu
copying pygpu/dtypes.py -> build/lib.linux-x86_64-2.7/pygpu
copying pygpu/operations.py -> build/lib.linux-x86_64-2.7/pygpu
creating build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/test_operations.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/test_reduction.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/test_tools.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/init.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/test_parser.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/main.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/support.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/test_elemwise.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/test_blas.py -> build/lib.linux-x86_64-2.7/pygpu/tests
copying pygpu/tests/test_gpu_ndarray.py -> build/lib.linux-x86_64-2.7/pygpu/tests
running build_ext
building 'pygpu.gpuarray' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/pygpu
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DGPUARRAY_SHARED -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c pygpu/gpuarray.c -o build/temp.linux-x86_64-2.7/pygpu/gpuarray.o
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1804:0,
from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:17,
from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
from pygpu/gpuarray.c:265:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ufuncobject.h:317:0,
from pygpu/gpuarray.c:266:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/**ufunc_api.h:241:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
_import_umath(void)
^
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/pygpu/gpuarray.o -lgpuarray -o build/lib.linux-x86_64-2.7/pygpu/gpuarray.so
building 'pygpu.blas' extension
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DGPUARRAY_SHARED -I/usr/local/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c pygpu/blas.c -o build/temp.linux-x86_64-2.7/pygpu/blas.o
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1804:0,
from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:17,
from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
from pygpu/blas.c:265:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ndarrayobject.h:26:0,
from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
from pygpu/blas.c:265:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/__multiarray_api.h:1629:1: warning: ‘_import_array’ defined but not used [-Wunused-function]
_import_array(void)
^
In file included from /usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/ufuncobject.h:317:0,
from pygpu/blas.c:266:
/usr/local/lib/python2.7/dist-packages/numpy/core/include/numpy/__ufunc_api.h:241:1: warning: ‘_import_umath’ defined but not used [-Wunused-function]
_import_umath(void)
^
pygpu/blas.c:1158:25: warning: ‘__pyx_f_5pygpu_8gpuarray_typecode_to_dtype’ defined but not used [-Wunused-variable]
static PyArray_Descr _(___pyx_f_5pygpu_8gpuarray_typecode_to_dtype)(int); /proto/
^
pygpu/blas.c:1159:14: warning: ‘__pyx_f_5pygpu_8gpuarray_get_typecode’ defined but not used [-Wunused-variable]
static int (___pyx_f_5pygpu_8gpuarray_get_typecode)(PyObject *); /proto/
^
pygpu/blas.c:1160:37: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_default_context’ defined but not used [-Wunused-variable]
static struct PyGpuContextObject *(___pyx_f_5pygpu_8gpuarray_pygpu_default_context)(void); /proto/
^
pygpu/blas.c:1161:14: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_GpuArray_Check’ defined but not used [-Wunused-variable]
static int (___pyx_f_5pygpu_8gpuarray_pygpu_GpuArray_Check)(PyObject *); /proto/
^
pygpu/blas.c:1162:37: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_init’ defined but not used [-Wunused-variable]
static struct PyGpuContextObject *(___pyx_f_5pygpu_8gpuarray_pygpu_init)(PyObject _); /proto/
^
pygpu/blas.c:1165:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_fromhostdata’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_fromhostdata)(void _, int, unsigned int, size_t const *, Py_ssize_t const *, struct PyGpuContextObject *, PyTypeObject *); /proto/
^
pygpu/blas.c:1166:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_fromgpudata’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_fromgpudata)(gpudata _, size_t, int, unsigned int, size_t const *, Py_ssize_t const *, struct PyGpuContextObject *, int, PyObject *, PyTypeObject *); /proto/
^
pygpu/blas.c:1168:14: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_move’ defined but not used [-Wunused-variable]
static int (___pyx_f_5pygpu_8gpuarray_pygpu_move)(struct PyGpuArrayObject _, struct PyGpuArrayObject *); /proto/
^
pygpu/blas.c:1169:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_view’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_view)(struct PyGpuArrayObject _, PyTypeObject *); /proto/
^
pygpu/blas.c:1170:14: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_sync’ defined but not used [-Wunused-variable]
static int (___pyx_f_5pygpu_8gpuarray_pygpu_sync)(struct PyGpuArrayObject _); /proto/
^
pygpu/blas.c:1171:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_empty_like’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_empty_like)(struct PyGpuArrayObject _, ga_order, int); /proto/
^
pygpu/blas.c:1172:25: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_as_ndarray’ defined but not used [-Wunused-variable]
static PyArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_as_ndarray)(struct PyGpuArrayObject _); /proto/
^
pygpu/blas.c:1173:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_index’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_index)(struct PyGpuArrayObject _, Py_ssize_t const *, Py_ssize_t const *, Py_ssize_t const *); /proto/
^
pygpu/blas.c:1174:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_reshape’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_reshape)(struct PyGpuArrayObject _, unsigned int, size_t const *, ga_order, int, int); /proto/
^
pygpu/blas.c:1175:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_transpose’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_transpose)(struct PyGpuArrayObject _, unsigned int const *); /proto/
^
pygpu/blas.c:1176:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_transfer’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject *(___pyx_f_5pygpu_8gpuarray_pygpu_transfer)(struct PyGpuArrayObject _, struct PyGpuContextObject *, int); /proto/
^
pygpu/blas.c:1177:35: warning: ‘__pyx_f_5pygpu_8gpuarray_pygpu_concatenate’ defined but not used [-Wunused-variable]
static struct PyGpuArrayObject (__pyx_f_5pygpu_8gpuarray_pygpu_concatenate)(GpuArray const *, size_t, unsigned int, int, PyTypeObject , struct PyGpuContextObject ); /_proto/
^
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/pygpu/blas.o -lgpuarray -o build/lib.linux-x86_64-2.7/pygpu/blas.so
root@ip-10-202-167-238:~/libgpuarray# python setup.py install
running install
running bdist_egg
running egg_info
creating pygpu.egg-info
writing pbr to pygpu.egg-info/pbr.json
writing requirements to pygpu.egg-info/requires.txt
writing pygpu.egg-info/PKG-INFO
writing top-level names to pygpu.egg-info/top_level.txt
writing dependency_links to pygpu.egg-info/dependency_links.txt
writing manifest file 'pygpu.egg-info/SOURCES.txt'
reading manifest file 'pygpu.egg-info/SOURCES.txt'
writing manifest file 'pygpu.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/tools.py -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/__init.py -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/elemwise.py -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/parser.py -> build/bdist.linux-x86_64/egg/pygpu
creating build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/test_operations.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/test_reduction.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/test_tools.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/init.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/test_parser.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/main.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/support.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/test_elemwise.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/test_blas.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/tests/test_gpu_ndarray.py -> build/bdist.linux-x86_64/egg/pygpu/tests
copying build/lib.linux-x86_64-2.7/pygpu/blas.so -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/reduction.py -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/_array.py -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/gpuarray.so -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/dtypes.py -> build/bdist.linux-x86_64/egg/pygpu
copying build/lib.linux-x86_64-2.7/pygpu/operations.py -> build/bdist.linux-x86_64/egg/pygpu
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tools.py to tools.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/elemwise.py to elemwise.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/parser.py to parser.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/test_operations.py to test_operations.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/test_reduction.py to test_reduction.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/test_tools.py to test_tools.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/init.py to init.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/test_parser.py to test_parser.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/main.py to main.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/support.py to support.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/test_elemwise.py to test_elemwise.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/test_blas.py to test_blas.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/tests/test_gpu_ndarray.py to test_gpu_ndarray.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/reduction.py to reduction.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/_array.py to _array.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/dtypes.py to dtypes.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/operations.py to operations.pyc
creating stub loader for pygpu/gpuarray.so
creating stub loader for pygpu/blas.so
byte-compiling build/bdist.linux-x86_64/egg/pygpu/gpuarray.py to gpuarray.pyc
byte-compiling build/bdist.linux-x86_64/egg/pygpu/blas.py to blas.pyc
installing package data to build/bdist.linux-x86_64/egg
running install_data
copying pygpu/gpuarray.h -> build/bdist.linux-x86_64/egg/pygpu
copying pygpu/gpuarray_api.h -> build/bdist.linux-x86_64/egg/pygpu
copying pygpu/blas_api.h -> build/bdist.linux-x86_64/egg/pygpu
copying pygpu/numpy_compat.h -> build/bdist.linux-x86_64/egg/pygpu
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying pygpu.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pygpu.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pygpu.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pygpu.egg-info/pbr.json -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pygpu.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying pygpu.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
pygpu.init: module references file
pygpu.tests.main: module references file
creating dist
creating 'dist/pygpu-0.2.1-py2.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing pygpu-0.2.1-py2.7-linux-x86_64.egg
creating /usr/local/lib/python2.7/dist-packages/pygpu-0.2.1-py2.7-linux-x86_64.egg
Extracting pygpu-0.2.1-py2.7-linux-x86_64.egg to /usr/local/lib/python2.7/dist-packages
Adding pygpu 0.2.1 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/pygpu-0.2.1-py2.7-linux-x86_64.egg
Processing dependencies for pygpu==0.2.1
Searching for mako>=0.7
Reading https://pypi.python.org/simple/mako/
Best match: Mako 1.0.2
Downloading https://pypi.python.org/packages/source/M/Mako/Mako-1.0.2.tar.gz#md5=d0bb0b15d94d0d455fbaf047d312cc2d
Processing Mako-1.0.2.tar.gz
Writing /tmp/easy_install-5cjuuJ/Mako-1.0.2/setup.cfg
Running Mako-1.0.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-5cjuuJ/Mako-1.0.2/egg-dist-tmp-zqAaSm
warning: no files found matching '.xml' under directory 'examples'
warning: no files found matching '.mako' under directory 'examples'
warning: no files found matching 'distribute_setup.py'
warning: no files found matching 'ez_setup.py'
no previously-included directories found matching 'doc/build/output'
Adding Mako 1.0.2 to easy-install.pth file
Installing mako-render script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/Mako-1.0.2-py2.7.egg
Searching for MarkupSafe>=0.9.2
Reading https://pypi.python.org/simple/MarkupSafe/
Best match: MarkupSafe 0.23
Downloading https://pypi.python.org/packages/source/M/MarkupSafe/MarkupSafe-0.23.tar.gz#md5=f5ab3deee4c37cd6a922fb81e730da6e
Processing MarkupSafe-0.23.tar.gz
Writing /tmp/easy_install-1ihQVy/MarkupSafe-0.23/setup.cfg
Running MarkupSafe-0.23/setup.py -q bdist_egg --dist-dir /tmp/easy_install-1ihQVy/MarkupSafe-0.23/egg-dist-tmp-tTH9Zl
Adding MarkupSafe 0.23 to easy-install.pth file

Installed /usr/local/lib/python2.7/dist-packages/MarkupSafe-0.23-py2.7-linux-x86_64.egg
Finished processing dependencies for pygpu==0.2.1

But in python I can't import pygpu:

root@xxx:~/libgpuarray# python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pygpu
Traceback (most recent call last):
File "", line 1, in
File "pygpu/init.py", line 7, in
from . import gpuarray, elemwise, reduction
File "pygpu/elemwise.py", line 5, in
from .tools import ScalarArg, ArrayArg, as_argument, check_contig, check_args, lru_cache
File "pygpu/tools.py", line 13, in
from .dtypes import dtype_to_ctype, _fill_dtype_registry
File "pygpu/dtypes.py", line 31, in
from . import gpuarray
ImportError: cannot import name gpuarray

Pygpu init function doesn't process opencl device

Hi Guys,

The following diff resolved issue I had when tried to use pygpu with opencl on my ARM chromebook:

Best Regards,
marcino239

diff --git a/pygpu/gpuarray.pyx b/pygpu/gpuarray.pyx
index 01390d0..408800f 100644
--- a/pygpu/gpuarray.pyx
+++ b/pygpu/gpuarray.pyx
@@ -539,8 +539,11 @@ cdef GpuContext pygpu_init(dev):
             devnum = int(dev[4:])
     elif dev.startswith('opencl'):
         kind = "opencl"
-        devspec = dev[6:].split(':')
-        devnum = int(devspec[0]) << 16 | int(devspec[1])
+        if dev[6:] == '':
+            devnum = 0
+        else:
+            devspec = dev[6:].split(':')
+            devnum = int(devspec[0]) << 16 | int(devspec[1])
     else:
         raise ValueError, "Unknown device format:" + dev
     return GpuContext(kind, devnum)

Simple calls for 1d array construction

Numpy allows both type of calls:

>>> numpy.zeros(3)
array([0, 0, 0])
>>> numpy.zeros([3])
array([0, 0, 0])

it would be nice libgpuarray did this aswell, currently the "non-list" case throws an error: "object of type 'int' has no len()".

Device index confusion

Hi,
I have two graphic cards on my machine and they seem to get opposite IDs depending if I use cuda directly or Theano. Here is an example:

> nvidia-smi
Mon Jul  6 23:36:15 2015    
+------------------------------------------------------+                      
| NVIDIA-SMI 331.113    Driver Version: 331.113        |                      
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 8400 GS     Off  | 0000:02:00.0     N/A |                  N/A |
| N/A   51C  N/A     N/A /  N/A |     29MiB /   511MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX TITAN   Off  | 0000:03:00.0     N/A |                  N/A |
| 32%   45C  N/A     N/A /  N/A |     14MiB /  6143MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
|    1            Not Supported                                               |
+-----------------------------------------------------------------------------+

From Python instead:

$ THEANO_FLAGS=device=gpu0 python -c "import theano"
Using gpu device 0: GeForce GTX TITAN
$ THEANO_FLAGS=device=gpu1 python -c "import theano"
Using gpu device 1: GeForce 8400 GS

Is this an bug or am I missing something? No big issue, just a bit confusing.

Thank you!
Giampiero

More info:
OS: Ubuntu 14.04.2
Kernel: 3.16.0-43-generic
Python 2.7.6
CUDA 5.5 (standard Ubuntu installation)
Theano 0.7.0 (installed via pip)

980GTX cuda tests failing

Hello,
Wanted to report an error I had and some a potential workaround when installing and running tests. (The tests fail both for cuda and opencl, but this issue focuses on cuda.)

My system:

980gtx with 346.59 drivers
cuda 7
ubuntu 15.04

The tests that fail are in test_blas.py.
`
In an effort to workaround, I found: https://groups.google.com/forum/#!topic/theano-users/nEr9PqQF880, and bumped up the version number of the PTX from 4.0 to 4.2 and that seems to fix issues locally for cuda only, but I have no idea the consequences.

The cuda errors look like with 315 of them:

======================================================================
ERROR: pygpu.tests.test_blas.test_ger(4, 5, 'float32', 'f', 1, 1, False, False)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/luke/Apps/libgpuarray/pygpu/tests/test_blas.py", line 154, in ger
    gr = gblas.ger(1.0, gX, gY, gA, overwrite_a=overwrite)
  File "pygpu/blas.pyx", line 130, in pygpu.blas.ger (pygpu/blas.c:2633)
    pygpu_blas_rger(alpha, X, Y, A, 0)
  File "pygpu/blas.pyx", line 43, in pygpu.blas.pygpu_blas_rger (pygpu/blas.c:1561)
    raise GpuArrayException(GpuArray_error(&X.ga, err), err)
GpuArrayException: ('Device does not support operation', 8)

----------------------------------------------------------------------

The opencl errors look like with 315 of them:

ERROR: pygpu.tests.test_blas.test_gemv((32, 32), 'float32', 'f', False, False, 1, False, True, 0, 0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/luke/Apps/libgpuarray/pygpu/tests/support.py", line 39, in f
    func(*args, **kwargs)
  File "/home/luke/Apps/libgpuarray/pygpu/tests/test_blas.py", line 61, in gemv
    overwrite_y=overwrite)
  File "pygpu/blas.pyx", line 72, in pygpu.blas.gemv (pygpu/blas.c:1925)
    pygpu_blas_rgemv(transA, alpha, A, X, beta, Y, 0)
  File "pygpu/blas.pyx", line 27, in pygpu.blas.pygpu_blas_rgemv (pygpu/blas.c:1317)
    raise GpuArrayException(GpuArray_error(&A.ga, err), err)
GpuArrayException: ('Device does not support operation', 8)
ERROR: pygpu.tests.test_blas.test_gemv((32, 32), 'float32', 'f', False, False, 1, False, True, 0, 0)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/luke/Apps/libgpuarray/pygpu/tests/support.py", line 39, in f
    func(*args, **kwargs)
  File "/home/luke/Apps/libgpuarray/pygpu/tests/test_blas.py", line 61, in gemv
    overwrite_y=overwrite)
  File "pygpu/blas.pyx", line 72, in pygpu.blas.gemv (pygpu/blas.c:1925)
    pygpu_blas_rgemv(transA, alpha, A, X, beta, Y, 0)
  File "pygpu/blas.pyx", line 27, in pygpu.blas.pygpu_blas_rgemv (pygpu/blas.c:1317)
    raise GpuArrayException(GpuArray_error(&A.ga, err), err)
GpuArrayException: ('Device does not support operation', 8)

If you would like any more information or testing please let me know! Thanks!

Repeat init throws an error

If you call pygpu.init repeatedly you get an error:

pygpu.gpuarray.GpuArrayException: cannot set while device is active in this process: -1

this also applies if you at any point in the same process called cudaSetDevice from some other library. This is problematic since it interferes with using libgpuarray together with other cuda libraries. The simple fix is to simply ignore cudaErrorSetOnActiveProcess, as is done by other libraries e.g. ASTRA.

a[...] should return a view as NumPy 1.9 do that now.

From NumPy 1.9 release note:

https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst
"""
All indexing operations return a view or a copy. No indexing operation will return the original array object. (For example arr[...])
"""

where did the "Build

syntax error in gen_types.py, line 229 with python 3.4

easy fix, only needs parentheses: print(...). I'm new to python, I'm using 3.4 so I have no idea if this could break 2.x.

Tests fails with new NumPy version.

see this thread:

[theano-users] libgpuarray and theano test failures -- anything I need to worry about?

test errors

Hi,

I get the following errors when I run "python -c "import pygpu;pygpu.test()"".

315 GpuArrayExceptions. The below is an example.
ERROR: pygpu.tests.test_blas.test_gemv((100, 128), 'float32', 'f', False, True, 1, True, False)
Traceback (most recent call last):
File "/Users/dokookchoe/Library/Python/2.7/lib/python/site-packages/nose/case.py", line 197, in runTest
self.test(_self.arg)
File "/Library/Python/2.7/site-packages/pygpu-0.2.1-py2.7-macosx-10.9-intel.egg/pygpu/tests/support.py", line 39, in f
func(_args, **kwargs)
File "/Library/Python/2.7/site-packages/pygpu-0.2.1-py2.7-macosx-10.9-intel.egg/pygpu/tests/test_blas.py", line 61, in gemv
overwrite_y=overwrite)
File "pygpu/blas.pyx", line 72, in pygpu.blas.gemv (pygpu/blas.c:1981)
File "pygpu/blas.pyx", line 27, in pygpu.blas.pygpu_blas_rgemv (pygpu/blas.c:1373)
GpuArrayException: ('Error in BLAS call', 11)

150 NameErrors. The below is an example.
ERROR: pygpu.tests.test_elemwise.test_divmod('uint64', (50,), array(2.450000047683716, dtype=float32))
Traceback (most recent call last):
File "/Users/dokookchoe/Library/Python/2.7/lib/python/site-packages/nose/case.py", line 197, in runTest
self.test(_self.arg)
File "/Library/Python/2.7/site-packages/pygpu-0.2.1-py2.7-macosx-10.9-intel.egg/pygpu/tests/support.py", line 39, in f
func(_args, **kwargs)
File "/Library/Python/2.7/site-packages/pygpu-0.2.1-py2.7-macosx-10.9-intel.egg/pygpu/tests/test_elemwise.py", line 254, in divmod_mixed
out_g = divmod(g, elem)
File "/Library/Python/2.7/site-packages/pygpu-0.2.1-py2.7-macosx-10.9-intel.egg/pygpu/_array.py", line 134, in divmod
if not isinstance(other, array.GpuArray):
NameError: global name 'array' is not defined

Is anyone familiar with these errors? I have used the latest libgpuarray and followed all instructions except for one. I changed set(CMAKE_OSX_ARCHITECTURES i386 x86_64) to set(CMAKE_OSX_ARCHITECTURES ix86_64). My machine is an imac with OS 10.9.

Thanks,
DK

Performance of libgpuarray is slow

I tried the floowing code from tutorial with THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1":

import numpy
import theano

v01 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev0')
v02 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev0')
v11 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev1')
v12 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev1')

f = theano.function([], [theano.tensor.dot(v01, v02),
theano.tensor.dot(v11, v12)])

start = time()
for i in range(100):
f()
print 'Execution time: %s' % (time() - start)

Output:
Mapped name dev0 to device cuda0: GeForce GTX TITAN X
Mapped name dev1 to device cuda1: GeForce GTX TITAN X
Execution time: 0.309636116028

I replaced target='dev1' on target='dev0': all dot operations on one device with the same THEANO_FLAGS. The output:
Mapped name dev0 to device cuda0: GeForce GTX TITAN X
Mapped name dev1 to device cuda1: GeForce GTX TITAN X
Execution time: 0.255516061783
I executed the following code with THEANO_FLAGS="device=gpu":

from time import time

import numpy
import theano

v01 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'))
v02 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'))
v11 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'))
v12 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'))

f = theano.function([], [theano.tensor.dot(v01, v02),
theano.tensor.dot(v11, v12)])

start = time()
for i in range(100):
f()
print 'Execution time: %s' % (time() - start)

Output:
Using gpu device 0: GeForce GTX TITAN X (CNMeM is enabled)
Execution time: 0.192573070526

So, It looks like the current default gpu backend is the fastest one. On the second place libgpuarray on one device.

Why is it so? How to execute operations on two GPUs faster than on one GPU?

(Mac OS with ATI FirePro D500)pygpu was configured but could not be imported

Hi,
in Mac OS Yosemite with ATI FirePro D500 (Mac Pro Late 2013),
Followed the step-by-step-guide of libgpuarray: http://deeplearning.net/software/libgpuarray/installation.html#requirements

But when testing Theano with OpenCL, the output is:

ShiBotians-Mac-Pro:libgpuarray shibotian$ export THEANO_FLAGS=floatX=float32,device=opencl
shibotian$ ipython
In [1]: import theano
ERROR (theano.sandbox.gpuarray): pygpu was configured but could not be imported
Traceback (most recent call last):
File "/Library/Python/2.7/site-packages/theano/sandbox/gpuarray/init.py", line 16, in
import pygpu
File "pygpu/init.py", line 7, in
from . import gpuarray
ImportError: cannot import name gpuarray

Any clues to solve the problem?

Looking forward to your kind hints.
Kind regards.
Friskit

(Sorry for coping this issue and my bad English level)

Can not build libgpuarray

Here is the output of make:

[  2%] Generating ../../src/gpuarray_types.c, ../../src/gpuarray/types.h
[  5%] Building C object src/CMakeFiles/gpuarray.dir/cache/lru.o
[  8%] Building C object src/CMakeFiles/gpuarray.dir/cache/twoq.o
[ 11%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_types.o
[ 14%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_error.o
[ 17%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_util.o
[ 20%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_buffer.o
[ 23%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_array.o
[ 26%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_array_blas.o
[ 29%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_kernel.o
[ 32%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_extension.o
[ 35%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_strl.o
[ 38%] Building C object src/CMakeFiles/gpuarray.dir/gpuarray_buffer_cuda.o
/home/rizar/dist/libgpuarray/src/gpuarray_buffer_cuda.c: In function ‘call_compiler’:
/home/rizar/dist/libgpuarray/src/gpuarray_buffer_cuda.c:977:15: error: expected expression before ‘/’ token
/home/rizar/dist/libgpuarray/src/gpuarray_buffer_cuda.c:977:15: error: too few arguments to function ‘execl’
make[2]: *** [src/CMakeFiles/gpuarray.dir/gpuarray_buffer_cuda.o] Error 1
make[1]: *** [src/CMakeFiles/gpuarray.dir/all] Error 2
make: *** [all] Error 2

In addition, I tried to grep NVCC_BIN:

../build/src/CMakeFiles/gpuarray.dir/flags.make:C_FLAGS =  -Wall -O3 -DNDEBUG -fPIC -I/home/rizar/dist/libgpuarray/src -I/usr/local/cuda/include    -DNVCC_BIN="/usr/local/cuda/bin/nvcc"
../build/src/CMakeFiles/gpuarray-static.dir/flags.make:C_FLAGS =  -Wall -O3 -DNDEBUG -I/home/rizar/dist/libgpuarray/src -I/usr/local/cuda/include    -DNVCC_BIN="/usr/local/cuda/bin/nvcc"
../src/gpuarray_buffer_cuda.c:#define NVCC_ARGS NVCC_BIN, "-g", "-G", "-arch", arch_arg, "-x", "cu", \
../src/gpuarray_buffer_cuda.c:#define NVCC_ARGS NVCC_BIN, "-arch", arch_arg, "-x", "cu", \
../src/gpuarray_buffer_cuda.c:    sys_err = _spawnl(_P_WAIT, NVCC_BIN, NVCC_ARGS, NULL);
../src/gpuarray_buffer_cuda.c:        execl(NVCC_BIN, NVCC_ARGS, NULL);
../src/CMakeLists.txt:    add_definitions(-DNVCC_BIN="${CUDA_NVCC_EXECUTABLE}")

Make ielemwise raise error as numpy 1.10

numpy 1.10 disabled some inplace operation when the dtype aren't the same. I think we should do the same.

why drop cmake 2.8? Ubuntu 14.04 don't have cmake 3.0

At least, tell in the error message that conda have a recent enough version.

[Mac OS] AMD Fire Pro D500 Exception: 'compyte/array.h' file not found.

After I installed Theano by pip, I ipython by environment of THEANO_FLAGS="floatX=float32,device=opencl0:0. It's ok to import theano. But when I compiled expresstion to a theano function, I got errors:

Problem occurred during compilation with the command line below:
g++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -D NPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -undefined dynamic_lookup -I/Library/Python/2.7/site-packages/pygpu-0.2.1-py2.7-macosx-10.10-intel.egg/pygpu -I/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -o /Users/shibotian/.theano/compiledir_Darwin-14.0.0-x86_64-i386-64bit-i386-2.7.6-64/tmpMKZm7S/6009f7ce15109cbb7305d79e9f400e36.so /Users/shibotian/.theano/compiledir_Darwin-14.0.0-x86_64-i386-64bit-i386-2.7.6-64/tmpMKZm7S/mod.cpp -L/System/Library/Frameworks/Python.framework/Versions/2.7/lib -lcompyte
/Users/shibotian/.theano/compiledir_Darwin-14.0.0-x86_64-i386-64bit-i386-2.7.6-64/tmpMKZm7S/mod.cpp:6:10: fatal error: 'compyte/array.h' file not found

include <compyte/array.h>

1 error generated.

Traceback (most recent call last):
File "code/test.py", line 438, in
test()
File "code/test.py", line 308, in test
y: test_set_y[index * batch_size: (index + 1) * batch_size]
File "/Library/Python/2.7/site-packages/theano/compile/function.py", line 223, in function
profile=profile)
File "/Library/Python/2.7/site-packages/theano/compile/pfunc.py", line 512, in pfunc
on_unused_input=on_unused_input)
File "/Library/Python/2.7/site-packages/theano/compile/function_module.py", line 1312, in orig_function
defaults)
File "/Library/Python/2.7/site-packages/theano/compile/function_module.py", line 1181, in create
_fn, _i, _o = self.linker.make_thunk(input_storage=input_storage_lists)
File "/Library/Python/2.7/site-packages/theano/gof/link.py", line 434, in make_thunk
output_storage=output_storage)[:3]
File "/Library/Python/2.7/site-packages/theano/gof/vm.py", line 847, in make_all
no_recycling))
File "/Library/Python/2.7/site-packages/theano/gof/op.py", line 606, in make_thunk
output_storage=node_output_storage)
File "/Library/Python/2.7/site-packages/theano/gof/cc.py", line 948, in make_thunk
keep_lock=keep_lock)
File "/Library/Python/2.7/site-packages/theano/gof/cc.py", line 891, in compile
keep_lock=keep_lock)
File "/Library/Python/2.7/site-packages/theano/gof/cc.py", line 1322, in cthunk_factory
key=key, fn=self.compile_cmodule_by_step, keep_lock=keep_lock)
File "/Library/Python/2.7/site-packages/theano/gof/cmodule.py", line 996, in module_from_key
module = next(compile_steps)
File "/Library/Python/2.7/site-packages/theano/gof/cc.py", line 1237, in compile_cmodule_by_step
preargs=preargs)
File "/Library/Python/2.7/site-packages/theano/gof/cmodule.py", line 1971, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: ('The following error happened while compiling the node', Shape_i{0}(<GpuArray>), '\n', "Compilation failed (return status=1): /Users/shibotian/.theano/compiledir_Darwin-14.0.0-x86_64-i386-64bit-i386-2.7.6-64/tmpMKZm7S/mod.cpp:6:10: fatal error: 'compyte/array.h' file not found. #include <compyte/array.h>. ^. 1 error generated.. ", '[Shape_i{0}(<GpuArray>)]')

What should I do??

Thank you

Disable pickling of GpuArray

It doesn't work on protocol 0 or 1 but (appears to) work with protocol 2 and loads back invalid objects.

Add a way to grab the first free device.

Test with ARM

It was reported to not work gh-24

Make GpuArray_copy_from_host() use a fixed-size pagelocked buffer for copies

This would make the resulting array C-contiguous as could be expected and would not create a second buffer that is as large as the input array.

That inner buffer would still have to be of a large enough size so that transfers are not overshadowed by the initiation cost.

We could also use two alternating buffers to copy data to one while the other is transferring.

'import theano' doesn't work if I set device=gpu and contexts=dev0->cuda0;dev1->cuda1 in the same time

If in .theanorc I set:

device = gpu
contexts=dev0->cuda0;dev1->cuda1
....

I will get the following error during 'import theano':

ERROR (theano.sandbox.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/gpuarray/init.py", line 76, in
if (config.device.startswith('cuda') or
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/gpuarray/init.py", line 52, in init_dev
ctx = pygpu.init(dev)
File "pygpu/gpuarray.pyx", line 634, in pygpu.gpuarray.init (pygpu/gpuarray.c:8909)
File "pygpu/gpuarray.pyx", line 590, in pygpu.gpuarray.pygpu_init (pygpu/gpuarray.c:8632)
File "pygpu/gpuarray.pyx", line 1019, in pygpu.gpuarray.GpuContext.cinit (pygpu/gpuarray.c:12725)
GpuArrayException: cannot set while device is active in this process: 0

BUT If I remove "device = gpu" the import will work correctly.

Change default device to cuda0 or cudaN

The opencl back-end is lagging and will continue for some times. So I think we should change the detault.

We also don't have an option to let the driver select the GPU like in Theano. So I think we should change that cuda mean like in Theano, let the driver decide and create cudaX that mean, reuse the existing contexte.

Another option that I don't know if it is possible is to have cuda try to reuse the existing context and if none exist, let the driver pick one? If this is possible, I think it would be better.

Add interface to pass information about read/write of arguments in kernels

gcc 5.1.1 : max_align_t also defined in stddef.h

There's a conflicting redefinition going on for max_align_t between :

/home/andrewsm/env/libgpuarray/src/util/halloc.c ; and
/usr/lib/gcc/x86_64-redhat-linux/5.1.1/include/stddef.h

It looks like it could be resolved by having #ifndef _GCC_MAX_ALIGN_T" around the definitions in ourhalloc.c``.

Does this make sense?