Code Monkey home page Code Monkey logo

cudamat's Introduction

CUDAMat

The aim of the cudamat project is to make it easy to perform basic matrix calculations on CUDA-enabled GPUs from Python. cudamat provides a Python matrix class that performs calculations on a GPU. At present, some of the operations our GPU matrix class supports include:

  • Easy conversion to and from instances of numpy.ndarray.
  • Limited slicing support.
  • Matrix multiplication and transpose.
  • Elementwise addition, subtraction, multiplication, and division.
  • Elementwise application of exp, log, pow, sqrt.
  • Summation, maximum and minimum along rows or columns.
  • Conversion of CUDA errors into Python exceptions.

The current feature set of cudamat is biased towards features needed for implementing some common machine learning algorithms. We have included implementations of feedforward neural networks and restricted Boltzmann machines in the examples that come with cudamat.

Example:

import numpy as np
import cudamat as cm

cm.cublas_init()

# create two random matrices and copy them to the GPU
a = cm.CUDAMatrix(np.random.rand(32, 256))
b = cm.CUDAMatrix(np.random.rand(256, 32))

# perform calculations on the GPU
c = cm.dot(a, b)
d = c.sum(axis = 0)

# copy d back to the host (CPU) and print
print(d.asarray())

Documentation

An overview of the main features of cudamat can be found in the technical report:

CUDAMat: A CUDA-based matrix class for Python, Volodymyr Mnih, UTML TR 2009-004.

Download

You can obtain the latest release from the repository by typing:

git clone https://github.com/cudamat/cudamat.git

You can also download one of the releases from the releases section.

Installation

cudamat uses setuptools and can be installed via pip. For details, please see INSTALL.md.

Development

If you want to contribute new features or improvements, you're welcome to fork cudamat on github and send us your pull requests! Please see CONTRIBUTE.md if you need any help with that.

cudamat's People

Contributors

aminhp avatar bcsharp avatar ebattenberg avatar f0k avatar nitishsrivastava avatar pallegro avatar scttl avatar untom avatar vladmnih avatar zulupro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cudamat's Issues

Sum gives wrong result

What steps will reproduce the problem?

import numpy as np

x = np.array(np.random.random((100, 100)), dtype=np.float32)
x_sum = x.sum(axis=1)

import cudamat as cm
cm.cublas_init()

y = cm.CUDAMatrix(x)
y_sum = y.sum(axis=1).asarray()

print np.abs(x_sum - y_sum).sum()

z = cm.CUDAMatrix(x)
z_sum = z.asarray().sum(axis=1)

print np.abs(x_sum - z_sum).sum()

What is the expected output? What do you see instead?

Two numbers near zero.

35255.2
0.000637054

What version of the product are you using? On what operating system?

commit af7d9ca

all unit tests pass!

Ubuntu 12.04.5 LTS \n \l 64 bit

Please provide any additional information below.

Python 2.7.3

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

+------------------------------------------------------+
| NVIDIA-SMI 340.29 Driver Version: 340.29 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 570 Off | 0000:01:00.0 N/A | N/A |
| 42% 51C P0 N/A / N/A | 4MiB / 1279MiB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

Construct a matrix from a list of vectors

I have a list of vectors that I want to stack in a Matrix. Right now I am using gnumpy which uses cudamat under the hood.

This is my code

    arrays_size = vectors[0].size
    stack = gnumpy.zeros([arrays_size, len(arrays)])
    for i in range(len(arrays)):
        stack[:, i] = arrays[i]

When I profile this code I see a lot of synchronization steps because of the multiple assignments in Python. Is there a way to do the foor loop all at once?

Cudamat install error in Windows

Hello, I was currently trying to rerun a project using cudamat. When I tried to install cudamat, I have encountered a lot of troubles but solved them after all. But there was still one that kept me from finishing it.

My system is windows 10, my GPU is GTX970. I have had python 2.7.10 and the latest CUDA toolkit (v7.5) installed and both Visual studio 12 and 14 installed (the nvcc of 14 was not supported by CUDA so I installed 12). Also I installed pycuda as well (not sure if this matters.)

I first tried using the code from surban's branch, but after pip shows successful installed (also the lib was shown in pip list), when I ran nosetests, it gives the error which was similar to the one I will show in the next part.

Then I switched back to this branch and found that the Windows issue was supposed to be solved. I ran the installation following the instructions given in install.md, but a similar error was given, shown as below:

EE
======================================================================
ERROR: Failure: WindowsError ([Error 193] %1 is not a valid Win32 application)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\Program Files (x86)\Python27\lib\site-packages\nose\loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "D:\Program Files (x86)\Python27\lib\site-packages\nose\importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "D:\Program Files (x86)\Python27\lib\site-packages\nose\importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "C:\Users\Administrator\Desktop\cudamat\test\test_cudamat.py", line 3, in <module>
    import cudamat as cm
  File "D:\Program Files (x86)\Python27\lib\site-packages\cudamat-0.3-py2.7-win32.egg\cudamat\__init__.py", line 1, in <module>
    from .cudamat import *
  File "D:\Program Files (x86)\Python27\lib\site-packages\cudamat-0.3-py2.7-win32.egg\cudamat\cudamat.py", line 18, in <module>
    _cudamat = load_library('libcudamat')
  File "D:\Program Files (x86)\Python27\lib\site-packages\cudamat-0.3-py2.7-win32.egg\cudamat\cudamat.py", line 16, in load_library
    basename + ext))
  File "D:\Program Files (x86)\Python27\lib\ctypes\__init__.py", line 443, in LoadLibrary
    return self._dlltype(name)
  File "D:\Program Files (x86)\Python27\lib\ctypes\__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)
WindowsError: [Error 193] %1 is not a valid Win32 application

======================================================================
ERROR: Failure: WindowsError ([Error 193] %1 is not a valid Win32 application)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\Program Files (x86)\Python27\lib\site-packages\nose\loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "D:\Program Files (x86)\Python27\lib\site-packages\nose\importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "D:\Program Files (x86)\Python27\lib\site-packages\nose\importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "C:\Users\Administrator\Desktop\cudamat\test\test_learn.py", line 4, in <module>
    import cudamat as cm
  File "D:\Program Files (x86)\Python27\lib\site-packages\cudamat-0.3-py2.7-win32.egg\cudamat\__init__.py", line 1, in <module>
    from .cudamat import *
  File "D:\Program Files (x86)\Python27\lib\site-packages\cudamat-0.3-py2.7-win32.egg\cudamat\cudamat.py", line 18, in <module>
    _cudamat = load_library('libcudamat')
  File "D:\Program Files (x86)\Python27\lib\site-packages\cudamat-0.3-py2.7-win32.egg\cudamat\cudamat.py", line 16, in load_library
    basename + ext))
  File "D:\Program Files (x86)\Python27\lib\ctypes\__init__.py", line 443, in LoadLibrary
    return self._dlltype(name)
  File "D:\Program Files (x86)\Python27\lib\ctypes\__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)
WindowsError: [Error 193] %1 is not a valid Win32 application

----------------------------------------------------------------------
Ran 2 tests in 0.101s

FAILED (errors=2)

It shows error about loading the dll file libcudamat.dll, but the code looks good to me and the file is located where it is supposed to be. I tried adding the directory to the path but it does not help.

I hope someone could help me with this. Thanks.

Create CUDAMatrix from device pointer

hi all,
Is there any C API for CUDAMat? I want to create a matrix using CUDAMat in python from an already allocated GPU memory and I can only access the memory by a pointer in c. Is there any c/c++ API for CUDAMat to accomplish this?
Great thinks in advance!

Installation should recompile cudamat every time

I tested some compilation flags (namely sm_50), but it broke cudamat. When I reran
python setup.py install (without any compilation flag) it didn't recompile so my error was not fixed. Since I use an automatic configuration manager, this could become a problem.

Problems when installing cudamat on windows

Hi,

I have issues when installing cudamat on windows. I am using windows 10 64 bits, with python 2.7.12 (Anaconda 4.2). This issue is kind of similar to #69 but I get different errors.

First of all, I installed CUDA and have a functional nvcc command. I have tried to install cudamat wither by pip or directly running setup.py and both failed with the same kind of errors.

First, as it is suggested on https://wiki.python.org/moin/WindowsCompilers that one should use the same C++ compiler version as the python version, as I am using python 2.7 I try to install with visual C++ 9. This gives the following error:

Microsoft (R) C/C++ Optimizing Compiler Version 15.00.30729.01 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

tmpxft_00001e0c_00000000-1.cpp
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvcc fatal   : Host compiler targets unsupported OS.
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v8.0\\bin\\nvcc.exe' failed with exit status 1

Then, I read here http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#axzz4T75SPmzM that actually only C++ 11 onwards is supported. Therefore I tried with C++ 14 and then I get a different error:

nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
cudamat.obj
cudamat_kernels.obj
LINK : fatal error LNK1181: cannot open input file 'ID=2.obj'
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\nvcc.exe' failed with exit status 2

Is there anything that can be done? or it is just simply not going to work and I have to migrate to python 3.5? (but then I would get the problems as #69 I guess).

Has anybody been able to correctly install cudamat on windows?

thanks!

rbm_cudamat.py and nn_cudamat.py both crashed after 30 Epoch

Following is the error line repeated multiple times:
Exception cudamat.cudamat.CUDAMatException: CUDAMatException('CUBLAS error.',) in
<bound method CUDAMatrix.del of <cudamat.cudamat.CUDAMatrix object at 0x7fbe47b16950>> ignored

I am using CUDA 7.5 with GTX980TI on Ubuntu 14.04 LTS
all tests(nosetests, python ../examples/bench_cudamat.py) as mentioned in INSTALL.md worked perfectly. Any help is appreciable...

Windows install issue: corecrt.h not found

Windows 10
Python 2.7
CUDA toolkit 8
MS Visual 14 with c++ compiler

I was attempting to get #57 "solution" to work, but when I run from the cudamat.cu dir the following...
nvcc -c -O -o cudamat.obj cudamat.cu

I get the following error...
C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/../../VC/INCLUDE\crtdefs.h(10): fatal error C1083: Cannot open include file: 'corecrt.h': No such file or directory

It turns out that corecrt.h is found in...
C:\Program Files (x86)\Windows Kits\10\Include\10.0.14393.0\ucrt\

I added that directory to my path variable, but I still get the same error...
C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/../../VC/INCLUDE\crtdefs.h(10): fatal error C1083: Cannot open include file: 'corecrt.h': No such file or directory

Exception in CUDAMatrix.__del__

When running python test_cudamat.py, it always finishes with two exceptions:

Exception cudamat.CUDAMatException: CUDAMatException('CUBLAS error.',) in <bound method CUDAMatrix.__del__ of <cudamat.CUDAMatrix object at 0x24710d0>> ignored
Exception cudamat.CUDAMatException: CUDAMatException('CUBLAS error.',) in <bound method CUDAMatrix.__del__ of <cudamat.CUDAMatrix object at 0x2471110>> ignored

This has also been reported on google code both for test_cudamat.py and for nn_cudamat.py.

So far I've found the following:

  • When the exception is thrown, get_last_cuda_error() returns "Invalid device pointer".
  • When I disable both test_gamma() and test_lgamma() by renaming them, the exceptions disappear.
  • When I add del m1 and del m2 to the end of test_gamma() (and leave test_lgamma() unmodified), the exceptions disappear.
  • By maintaining sets of ct.addressof(self.mat.data_device.contents) and ct.addressof(self.mat) in CUDAMatrix class variables, updated in __del__(), I can see that the two exceptions occur for device pointers that have already been freed before, but with different cudamat structs, so there seems to be an attempted double free.
  • By tracking the ct.addressof(self.mat.data_device.contents) also in __init__(), it seems that sometimes a CUDAMatrix initialized from a numpy array gets the same device pointer as a previously created and not-yet-deleted CUDAMatrix (which leads to an attempted double free of said device pointer later). This is weird.

I cannot find anything wrong in the code, though. Any assignment to data_device either comes from a memory allocation or is accompanied by owns_data = 0.

Maybe it's dependent on the CUDA version. Who else can reproduce this problem?

Installing cudamat Win 8.1 error (building 'cudamat.libcudamat' extension)

I recently tried to install cudamat to use GPU for performance while running my neural networks on an nVidia GeForce 920M. On executing I encountered the following error. Any help on how to get rid of it would be appreciated.
Regards,
Ankit.

C:\Users\Ankit\Desktop\cudamat-master>python setup.py install
running install
running bdist_egg
running egg_info
writing cudamat.egg-info\PKG-INFO
writing top-level names to cudamat.egg-info\top_level.txt
writing dependency_links to cudamat.egg-info\dependency_links.txt
reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
building 'cudamat.libcudamat' extension
Traceback (most recent call last):
File "setup.py", line 121, in
cmdclass={'build_ext': CUDA_build_ext})
File "C:\Python27\lib\distutils\core.py", line 151, in setup
dist.run_commands()
File "C:\Python27\lib\distutils\dist.py", line 953, in run_commands
self.run_command(cmd)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools\command\install.py", line 67, i
n run
self.do_egg_install()
File "C:\Python27\lib\site-packages\setuptools\command\install.py", line 109,
in do_egg_install
self.run_command('bdist_egg')
File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools\command\bdist_egg.py", line 160
, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "C:\Python27\lib\site-packages\setuptools\command\bdist_egg.py", line 146
, in call_command
self.run_command(cmdname)
File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools\command\install_lib.py", line 1
0, in run
self.build()
File "C:\Python27\lib\distutils\command\install_lib.py", line 111, in build
self.run_command('build_ext')
File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
cmd_obj.run()
File "C:\Python27\lib\site-packages\setuptools\command\build_ext.py", line 49,
in run
_build_ext.run(self)
File "C:\Python27\lib\distutils\command\build_ext.py", line 339, in run
self.build_extensions()
File "setup.py", line 41, in build_extensions
build_ext.build_extensions(self)
File "C:\Python27\lib\distutils\command\build_ext.py", line 448, in build_exte
nsions
self.build_extension(ext)
File "C:\Python27\lib\site-packages\setuptools\command\build_ext.py", line 174
, in build_extension
_build_ext.build_extension(self, ext)
File "C:\Python27\lib\distutils\command\build_ext.py", line 498, in build_exte
nsion
depends=ext.depends)
File "C:\Python27\lib\distutils\msvc9compiler.py", line 546, in compile
extra_postargs)
File "setup.py", line 70, in spawn
os.path.dirname(find_executable("cl.exe", PATH))
File "C:\Python27\lib\ntpath.py", line 215, in dirname
return split(p)[0]
File "C:\Python27\lib\ntpath.py", line 180, in split
d, p = splitdrive(p)
File "C:\Python27\lib\ntpath.py", line 115, in splitdrive
if len(p) > 1:
TypeError: object of type 'NoneType' has no len()

My Python version is:
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 20:40:30) [MSC v.1500 64 bit (
AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

test cudamat error

Hello!
I am running Windows 10
CUDA 5.5
Visual Studio 2008 Professional

have installed cudamat seemingly with no problems

now when i am running test cudamat i get:

C:\PRG\cudamat-master\test>python test_cudamat.py
Traceback (most recent call last):
  File "test_cudamat.py", line 3, in <module>
    import cudamat as cm
  File "C:\Python27\lib\site-packages\cudamat\__init__.py", line 1, in <module>
    from .cudamat import *
  File "C:\Python27\lib\site-packages\cudamat\cudamat.py", line 12, in <module>
    _cudamat = ct.cdll.LoadLibrary('libcudamat.dll')
  File "C:\Python27\lib\ctypes\__init__.py", line 440, in LoadLibrary
    return self._dlltype(name)
  File "C:\Python27\lib\ctypes\__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126] The specified module could not be found

what might be the problem?

UPDATE: located libudamat.dll in C:\PRG\cudamat-master\cudamat and added it to environmental variables. now i get:

C:\PRG\cudamat-master\test>python test_cudamat.py
Traceback (most recent call last):
  File "test_cudamat.py", line 3, in <module>
    import cudamat as cm
  File "C:\Python27\lib\site-packages\cudamat-0.3-py2.7.egg\cudamat\__init__.py", line 1, in <module>
    from .cudamat import *
  File "C:\Python27\lib\site-packages\cudamat-0.3-py2.7.egg\cudamat\cudamat.py", line 12, in <module>
    _cudamat = ct.cdll.LoadLibrary('libcudamat.dll')
  File "C:\Python27\lib\ctypes\__init__.py", line 440, in LoadLibrary
    return self._dlltype(name)
  File "C:\Python27\lib\ctypes\__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
WindowsError: [Error 193] %1 is not a valid Win32 application

I wonder if it has something to do with compiling cudamat with incorect platform? i dont know if platform is incorrect

C:\PRG\cudamat-master>python setup.py install

Microsoft (R) Program Maintenance Utility Version 9.00.21022.08
Copyright (C) Microsoft Corporation.  All rights reserved.

        nvcc -O --ptxas-options=-v -o libcudamat.dll --shared cudamat.cu cudamat_kernels.cu -lcublas
ptxas : info : 0 bytes gmem
ptxas : info : Compiling entry function '__cuda_dummy_entry__' for 'sm_10'
ptxas : info : Used 0 registers
cudamat_kernels.cu(747): warning: division by zero

cudamat_kernels.cu(747): warning: division by zero

cudamat_kernels.cu(771): warning: division by zero

cudamat_kernels.cu(771): warning: division by zero

cudamat_kernels.cu(747): warning: division by zero

cudamat_kernels.cu(747): warning: division by zero

cudamat_kernels.cu(771): warning: division by zero

cudamat_kernels.cu(771): warning: division by zero

ptxas : info : 0 bytes gmem
ptxas : info : Compiling entry function '_Z10kApplyTanhPfS_j' for 'sm_10'
ptxas : info : Used 7 registers, 36 bytes smem, 16 bytes cmem[1]
ptxas : info : Compiling entry function '_Z16kMultByRowVectorPfS_S_jj' for 'sm_10'
ptxas : info : Used 11 registers, 48 bytes smem
ptxas : info : Compiling entry function '_Z7kEqualsPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z5kSqrtPfS_j' for 'sm_10'
ptxas : info : Used 6 registers, 36 bytes smem
ptxas : info : Compiling entry function '_Z15kRandomGaussianPjPyPfj' for 'sm_10'
ptxas : info : Used 14 registers, 44 bytes smem, 12 bytes cmem[1]
ptxas : info : Compiling entry function '_Z13kAssignScalarPffj' for 'sm_10'
ptxas : info : Used 5 registers, 32 bytes smem
ptxas : info : Compiling entry function '_Z14kMaxColumnwisePfS_jj' for 'sm_10'
ptxas : info : Used 5 registers, 168 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z4kLogPfS_j' for 'sm_10'
ptxas : info : Used 6 registers, 36 bytes smem
ptxas : info : Compiling entry function '_Z9kSubtractPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z8kMaximumPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z13kAddColVectorPfS_S_jj' for 'sm_10'
ptxas : info : Used 11 registers, 48 bytes smem
ptxas : info : Compiling entry function '_Z9kLessThanPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z16kSetSelectedRowsPfS_S_iii' for 'sm_10'
ptxas : info : Used 9 registers, 180 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z17kArgMaxColumnwisePfS_jj' for 'sm_10'
ptxas : info : Used 6 registers, 296 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z5kSignPfS_j' for 'sm_10'
ptxas : info : Used 6 registers, 36 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z11kAddColMultPfS_S_fjj' for 'sm_10'
ptxas : info : Used 11 registers, 52 bytes smem
ptxas : info : Compiling entry function '_Z12kGreaterThanPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z6kGammaPfS_j' for 'sm_10'
ptxas : info : Used 11 registers, 36 bytes smem, 72 bytes cmem[1]
ptxas : info : Compiling entry function '_Z11kSeedRandomPjPyj' for 'sm_10'
ptxas : info : Used 10 registers, 36 bytes smem, 12 bytes cmem[1]
ptxas : info : Compiling entry function '_Z5kMultPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z14kMinColumnwisePfS_jj' for 'sm_10'
ptxas : info : Used 5 registers, 168 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z9kApplyAbsPfS_j' for 'sm_10'
ptxas : info : Used 9 registers, 36 bytes smem
ptxas : info : Compiling entry function '_Z15kDivByRowVectorPfS_S_jj' for 'sm_10'
ptxas : info : Used 11 registers, 48 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z8kMinimumPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z10kPowMatrixPfS_S_j' for 'sm_10'
ptxas : info : Used 14 registers, 44 bytes smem, 88 bytes cmem[1]
ptxas : info : Compiling entry function '_Z12kSetRowSlicePfS_iiii' for 'sm_10'
ptxas : info : Used 8 registers, 48 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z10kAddScalarPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z17kArgMinColumnwisePfS_jj' for 'sm_10'
ptxas : info : Used 6 registers, 296 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z19kApplySoftThresholdPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z15kDivByColVectorPfS_S_jj' for 'sm_10'
ptxas : info : Used 11 registers, 48 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z13kEqualsScalarPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z4kPowPffS_j' for 'sm_10'
ptxas : info : Used 14 registers, 44 bytes smem, 88 bytes cmem[1]
ptxas : info : Compiling entry function '_Z12kGetRowSlicePfS_iiii' for 'sm_10'
ptxas : info : Used 8 registers, 48 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z13kDivideScalarPffS_j' for 'sm_10'
ptxas : info : Used 7 registers, 44 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z11kMaxRowwisePfS_jj' for 'sm_10'
ptxas : info : Used 6 registers, 168 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z4kExpPfS_j' for 'sm_10'
ptxas : info : Used 6 registers, 36 bytes smem
ptxas : info : Compiling entry function '_Z7kDividePfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z14kMaximumScalarPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z13kAddRowVectorPfS_S_jj' for 'sm_10'
ptxas : info : Used 11 registers, 48 bytes smem
ptxas : info : Compiling entry function '_Z15kLessThanScalarPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z6kWherePfS_S_S_j' for 'sm_10'
ptxas : info : Used 6 registers, 52 bytes smem
ptxas : info : Compiling entry function '_Z14kArgMaxRowwisePfS_jj' for 'sm_10'
ptxas : info : Used 7 registers, 296 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z13kApplySigmoidPfS_j' for 'sm_10'
ptxas : info : Used 6 registers, 36 bytes smem
ptxas : info : Compiling entry function '_Z16kMultByColVectorPfS_S_jj' for 'sm_10'
ptxas : info : Used 11 registers, 48 bytes smem
ptxas : info : Compiling entry function '_Z18kGreaterThanScalarPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z9kLogGammaPfS_j' for 'sm_10'
ptxas : info : Used 14 registers, 36 bytes smem, 240 bytes cmem[1]
ptxas : info : Compiling entry function '_Z14kRandomUniformPjPyPfj' for 'sm_10'
ptxas : info : Used 11 registers, 44 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z11kMultScalarPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z11kMinRowwisePfS_jj' for 'sm_10'
ptxas : info : Used 6 registers, 168 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z17kApplyLog1PlusExpPfS_j' for 'sm_10'
ptxas : info : Used 6 registers, 36 bytes smem, 4 bytes cmem[1]
ptxas : info : Compiling entry function '_Z4kAddPfS_S_j' for 'sm_10'
ptxas : info : Used 8 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z14kMinimumScalarPffS_j' for 'sm_10'
ptxas : info : Used 6 registers, 44 bytes smem
ptxas : info : Compiling entry function '_Z11kReciprocalPfS_j' for 'sm_10'
ptxas : info : Used 6 registers, 36 bytes smem
ptxas : info : Compiling entry function '_Z10kTransposePfS_ii' for 'sm_10'
ptxas : info : Used 8 registers, 1128 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z11kSelectRowsPfS_S_iii' for 'sm_10'
ptxas : info : Used 9 registers, 180 bytes smem, 8 bytes cmem[1]
ptxas : info : Compiling entry function '_Z14kArgMinRowwisePfS_jj' for 'sm_10'
ptxas : info : Used 7 registers, 296 bytes smem, 4 bytes cmem[1]
   Creating library libcudamat.lib and object libcudamat.exp
        nvcc -O --ptxas-options=-v -o libcudalearn.dll --shared learn.cu learn_kernels.cu -lcublas
ptxas : info : 0 bytes gmem
ptxas : info : Compiling entry function '__cuda_dummy_entry__' for 'sm_10'
ptxas : info : Used 0 registers
ptxas : info : 0 bytes gmem
ptxas : info : Compiling entry function '_Z22kMultiplyBySigmoidGradPfS_j' for 'sm_10'
ptxas : info : Used 7 registers, 36 bytes smem
   Creating library libcudalearn.lib and object libcudalearn.exp
running install
running bdist_egg
running egg_info
writing cudamat.egg-info\PKG-INFO
writing top-level names to cudamat.egg-info\top_level.txt
writing dependency_links to cudamat.egg-info\dependency_links.txt
reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
installing library code to build\bdist.win32\egg
running install_lib
running build_py
creating build\bdist.win32\egg
creating build\bdist.win32\egg\cudamat
copying build\lib\cudamat\cudamat.py -> build\bdist.win32\egg\cudamat
copying build\lib\cudamat\learn.py -> build\bdist.win32\egg\cudamat
copying build\lib\cudamat\rnd_multipliers_32bit.txt -> build\bdist.win32\egg\cudamat
copying build\lib\cudamat\__init__.py -> build\bdist.win32\egg\cudamat
byte-compiling build\bdist.win32\egg\cudamat\cudamat.py to cudamat.pyc
byte-compiling build\bdist.win32\egg\cudamat\learn.py to learn.pyc
byte-compiling build\bdist.win32\egg\cudamat\__init__.py to __init__.pyc
creating build\bdist.win32\egg\EGG-INFO
copying cudamat.egg-info\PKG-INFO -> build\bdist.win32\egg\EGG-INFO
copying cudamat.egg-info\SOURCES.txt -> build\bdist.win32\egg\EGG-INFO
copying cudamat.egg-info\dependency_links.txt -> build\bdist.win32\egg\EGG-INFO
copying cudamat.egg-info\top_level.txt -> build\bdist.win32\egg\EGG-INFO
zip_safe flag not set; analyzing archive contents...
cudamat.cudamat: module references __file__
cudamat.learn: module references __file__
creating 'dist\cudamat-0.3-py2.7.egg' and adding 'build\bdist.win32\egg' to it
removing 'build\bdist.win32\egg' (and everything under it)
Processing cudamat-0.3-py2.7.egg
removing 'c:\python27\lib\site-packages\cudamat-0.3-py2.7.egg' (and everything under it)
creating c:\python27\lib\site-packages\cudamat-0.3-py2.7.egg
Extracting cudamat-0.3-py2.7.egg to c:\python27\lib\site-packages
cudamat 0.3 is already the active version in easy-install.pth

Installed c:\python27\lib\site-packages\cudamat-0.3-py2.7.egg
Processing dependencies for cudamat==0.3
Finished processing dependencies for cudamat==0.3

Incompatible compiler when installing cudamat on windows

For some time now I am trying to install cudamat on Windows 10 x64, so that later I can install gnumpy. I did quite a lot of research on the subject and now I am pretty confused.
I read, that to compile a Python module I must use the same compiler that was used to compile the Python itself. Since I have Python 3.5, that means I need to use Visual Studio 14.0 (2015). But when I type
python setup.py install
in the developer's command prompt of VS2015 when in the cudamat-master directory, I get the following error:

running install
running bdist_egg
running egg_info
writing cudamat.egg-info\PKG-INFO
writing dependency_links to cudamat.egg-info\dependency_links.txt
writing top-level names to cudamat.egg-info\top_level.txt
reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
building 'cudamat.libcudamat' extension
creating build\temp.win-amd64-3.5
creating build\temp.win-amd64-3.5\Release
creating build\temp.win-amd64-3.5\Release\cudamat
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v7.5\bin\nvcc.exe --compiler-bindir "C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN" -c -IC:\Users\Wojtek\Anaconda3\include -IC:\Users\Wojtek\Anaconda3\include "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\winrt" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\ATLMFC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\winrt" cudamat/cudamat.cu -o build\temp.win-amd64-3.5\Release\cudamat/cudamat.obj -O --ptxas-options=-v --compiler-options=/nologo,/Ox,/W3,/GL,/DNDEBUG,/MD
nvcc fatal   : nvcc cannot find a supported version of Microsoft Visual Studio. Only the versions 2010, 2012, and 2013 are supported
error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v7.5\\bin\\nvcc.exe' failed with exit status 1

The error suggests that I cannot use VS2015... What am I doing wrong?
When I tried to use VS2013 I had the well known error "missing vcvarsall.bat".
At this point I am completely lost and would appreciate any help.

gcc version compatibility

I had an issue that cuda isn't compatible with gcc version 6 that the server has, so I suggest putting --compile-bindir flag in the setup.py file to avoid this problem for other users

CUDAMatrix.init_random crashes on Windows with Python 3

I have successfully installed the latest cudamat from the master branch (pip reported cudamat-0.3). My environment is:

  • Windows 10 64-bit
  • Anaconda Python 3.4.4 (64-bit)
  • Using Visual Studio 2010 (AKA v. 10.0)
  • CUDA 7.5
  • GeForce GTX 650 Ti

I am using Python 3.4 because neither Visual Studio 9.0 (required for Python 2.7) nor Visual Studio 14.0 (required for Python 3.5) is supported by CUDA 7.5.

The compilation goes fine, so do all the tests but one: test_random.

This test crashes the process on the call cm.CUDAMatrix.init_random(1)
Within init_random the offending line is:

        err_code = _cudamat.init_random(CUDAMatrix.rnd_state_p,
                                        ct.c_int(seed),
                                        cudamat_path)

The reason is that cudamat_path is Unicode under Python 3, while _cudamat.init_random has signature
int init_random(rnd_struct* rnd_state, int seed, char* cudamatpath)

When I change the function signature to
int init_random(rnd_struct* rnd_state, int seed, wchar_t* cudamatpath)
and use _wfopen instead of fopen within the function, everything works fine.

This is just a quick fix, not a solution; a proper solution should work for both Python 2 and 3 series, as well as Windows and non-Windows OS-es.

Install error on Ubuntu 14.04, CUDA 6.5 with setup.py/pip

nvcc -I/usr/include/python2.7 -c cudamat/cudamat.cu -o build/temp.linux-x86_64-2.7/cudamat/cudamat.o -O --ptxas-options=-v --compiler-options '-fPIC' unable to execute nvcc: No such file or directory error: command 'nvcc' failed with exit status 1

There is little chance that this comes from my configuration, since I reverted to a346369 and ran make and the tests successfully.

cudamat/cudamat/libcudamat.so: cannot open shared object file: No such file or directory

copying cudamat.egg-info/pbr.json -> build/bdist.linux-x86_64/egg/EGG-INFO
copying cudamat.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
cudamat.cudamat: module references file
cudamat.learn: module references file
creating dist
creating 'dist/cudamat-0.3-py2.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing cudamat-0.3-py2.7-linux-x86_64.egg
removing '/usr/local/lib/python2.7/dist-packages/cudamat-0.3-py2.7-linux-x86_64.egg' (and everything under it)
creating /usr/local/lib/python2.7/dist-packages/cudamat-0.3-py2.7-linux-x86_64.egg
Extracting cudamat-0.3-py2.7-linux-x86_64.egg to /usr/local/lib/python2.7/dist-packages
cudamat 0.3 is already the active version in easy-install.pth

Installed /usr/local/lib/python2.7/dist-packages/cudamat-0.3-py2.7-linux-x86_64.egg
Processing dependencies for cudamat==0.3
Finished processing dependencies for cudamat==0.3
ubgpu@ubgpu:/github/cudamat$
ubgpu@ubgpu:
/github/cudamat$
ubgpu@ubgpu:/github/cudamat$
ubgpu@ubgpu:
/github/cudamat$ nosetests

EEE

ERROR: Failure: OSError (/home/ubgpu/github/cudamat/cudamat/libcudamat.so: cannot open shared object file: No such file or directory)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/cudamat/cudamat/init.py", line 1, in
from .cudamat import *
File "/home/ubgpu/github/cudamat/cudamat/cudamat.py", line 14, in
os.path.dirname(file) or os.path.curdir, 'libcudamat' + sysconfig.get_config_var('SO')))
File "/usr/lib/python2.7/ctypes/init.py", line 443, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/init.py", line 365, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/ubgpu/github/cudamat/cudamat/libcudamat.so: cannot open shared object file: No such file or directory

ERROR: Failure: OSError (/home/ubgpu/github/cudamat/cudamat/libcudamat.so: cannot open shared object file: No such file or directory)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/cudamat/test/test_cudamat.py", line 3, in
import cudamat as cm
File "/home/ubgpu/github/cudamat/cudamat/init.py", line 1, in
from .cudamat import *
File "/home/ubgpu/github/cudamat/cudamat/cudamat.py", line 14, in
os.path.dirname(file) or os.path.curdir, 'libcudamat' + sysconfig.get_config_var('SO')))
File "/usr/lib/python2.7/ctypes/init.py", line 443, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/init.py", line 365, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/ubgpu/github/cudamat/cudamat/libcudamat.so: cannot open shared object file: No such file or directory

ERROR: Failure: OSError (/home/ubgpu/github/cudamat/cudamat/libcudamat.so: cannot open shared object file: No such file or directory)

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/nose/loader.py", line 411, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 47, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.7/dist-packages/nose/importer.py", line 94, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/ubgpu/github/cudamat/test/test_learn.py", line 4, in
import cudamat as cm
File "/home/ubgpu/github/cudamat/cudamat/init.py", line 1, in
from .cudamat import *
File "/home/ubgpu/github/cudamat/cudamat/cudamat.py", line 14, in
os.path.dirname(file) or os.path.curdir, 'libcudamat' + sysconfig.get_config_var('SO')))
File "/usr/lib/python2.7/ctypes/init.py", line 443, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/init.py", line 365, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/ubgpu/github/cudamat/cudamat/libcudamat.so: cannot open shared object file: No such file or directory


Ran 3 tests in 0.102s

FAILED (errors=3)
ubgpu@ubgpu:~/github/cudamat$

Another libcublas.so.7.5: cannot open shared object file problem

Great library. Exactly what I'm looking for but....

I was able to install then run nosetests and bench_cudamat.py without errors (although I found if I run nosetests a number of times I get a "CUDAMatrix.sum exceeded threshold" or "cudamat.pow exceeded threshold" intermittently. I put the full errors at the bottom in the Appendix).

I can run the example from the README.md in both the python shell or ipython but if I run it in Pycharm I get the following:


/home/paul/anaconda/bin/python /home/paul/PycharmProjects/untitled/po_cudamattest1967.py
Traceback (most recent call last):
File "/home/paul/PycharmProjects/untitled/po_cudamattest1967.py", line 3, in
import cudamat as cm
File "/home/paul/.local/lib/python2.7/site-packages/cudamat-0.3-py2.7-linux-x86_64.egg/cudamat/init.py", line 1, in
from .cudamat import *
File "/home/paul/.local/lib/python2.7/site-packages/cudamat-0.3-py2.7-linux-x86_64.egg/cudamat/cudamat.py", line 18, in
_cudamat = load_library('libcudamat')
File "/home/paul/.local/lib/python2.7/site-packages/cudamat-0.3-py2.7-linux-x86_64.egg/cudamat/cudamat.py", line 16, in load_library
basename + ext))
File "/home/paul/anaconda/lib/python2.7/ctypes/init.py", line 443, in LoadLibrary
return self._dlltype(name)
File "/home/paul/anaconda/lib/python2.7/ctypes/init.py", line 365, in init
self._handle = _dlopen(self._name, mode)
OSError: libcublas.so.7.5: cannot open shared object file: No such file or directory

Process finished with exit code 1


I'm running Ubuntu 14.04, Cuda 7.5, Pycharm 5.0 on a i7-5950X, 32Gb with a Titan X. My Theano programs run fine. I also tried it on Pycharm 4.5.

I compiled for a 5.2 Compute Capability via:
NVCCFLAGS=-arch=sm_52 python setup.py install --user

Any thoughts would be welcome. I still not great at Ubuntu so a little patience would be appreicated.

Regards
Paul

Appendix:
The two error I intermittently get between no error.

paul@speed:~/paul/cudamat/test$ nosetests

.........................................................

Ran 57 tests in 0.790s

OK

paul@speed:~/paul/cudamat/test$ nosetests

................F........................................

FAIL: test_cudamat.test_sum_trans

Traceback (most recent call last):
File "/home/paul/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(self.arg)
File "/home/paul/paul/cudamat/test/test_cudamat.py", line 367, in test_sum_trans
assert np.max(np.abs(c2 - mt2.numpy_array)) < 10
*-3, "Error in CUDAMatrix.sum exceeded threshold"
AssertionError: Error in CUDAMatrix.sum exceeded threshold


Ran 57 tests in 0.831s

FAILED (failures=1)
paul@speed:~/paul/cudamat/test$ nosetests

..................................F......................

FAIL: test_cudamat.test_pow_matrix

Traceback (most recent call last):
File "/home/paul/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(self.arg)
File "/home/paul/paul/cudamat/test/test_cudamat.py", line 791, in test_pow_matrix
assert np.max(np.abs(c - m1.numpy_array)) < 10
*-2, "Error in cudamat.pow exceeded threshold"
AssertionError: Error in cudamat.pow exceeded threshold


Ran 57 tests in 0.793s

FAILED (failures=1)

cm.dot with all-ones-matrix faster than matrix.sum

see the following code that calculates the rowsum of a matrix in two different ways

import time
import numpy as np
import cudamat as cm
cm.cublas_init()

x, y = 100, 200
X = cm.CUDAMatrix(np.random.random([x,y]))
ones = cm.CUDAMatrix(np.ones([1,x]))
d1 = cm.empty([1,y])
d2 = cm.empty([1,y])

def test1():
    for i in range(10000):
        X.sum(0, target=d1)

def test2():
    for i in range(10000):
        cm.dot(ones, X, target=d2)

print "Timing sum"
t0 = time.time()
test1()
t1 = time.time()
print "Runtime sum: ", t1 -t0
print "--------------"
print "Timing cm.dot"
t0 = time.time()
test2()
t1 = time.time()
print "Runtime cm.dot ", t1 -t0

np.sum(d2.asarray() == d1.asarray())

that returns

Timing sum
Runtime sum:  0.40524315834
--------------
Timing cm.dot
Runtime cm.dot  0.171550035477
200

-> it is faster to multiply by a custom vector than summing

Error on compiling on Windows with cmdclass

Hello and thanks for your great efforts on adding Windows work. I just cloned and tried installing, but I got a problem with the line cmdclass={'build_ext': CUDA_build_ext}). I am guessing that it has something to do with being on Windows vs Linux, because the error pertains to drives, but anyway, here is the full traceback. If you help, I would appreciate it.

running install
running bdist_egg
running egg_info
writing cudamat.egg-info\PKG-INFO
writing top-level names to cudamat.egg-info\top_level.txt
writing dependency_links to cudamat.egg-info\dependency_links.txt
writing pbr to cudamat.egg-info\pbr.json
reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
installing library code to build\bdist.win32\egg
running install_lib
running build_py
running build_ext
building 'cudamat.libcudamat' extension
Traceback (most recent call last):
  File "setup.py", line 121, in <module>
    cmdclass={'build_ext': CUDA_build_ext})
  File "C:\Python27\lib\distutils\core.py", line 151, in setup
    dist.run_commands()
  File "C:\Python27\lib\distutils\dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
    cmd_obj.run()
  File "build\bdist.win32\egg\setuptools\command\install.py", line 67, in run
  File "build\bdist.win32\egg\setuptools\command\install.py", line 109, in do_egg_install
  File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
    cmd_obj.run()
  File "build\bdist.win32\egg\setuptools\command\bdist_egg.py", line 160, in run
  File "build\bdist.win32\egg\setuptools\command\bdist_egg.py", line 146, in call_command
  File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
    cmd_obj.run()
  File "build\bdist.win32\egg\setuptools\command\install_lib.py", line 10, in run
  File "C:\Python27\lib\distutils\command\install_lib.py", line 111, in build
    self.run_command('build_ext')
  File "C:\Python27\lib\distutils\cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "C:\Python27\lib\distutils\dist.py", line 972, in run_command
    cmd_obj.run()
  File "build\bdist.win32\egg\setuptools\command\build_ext.py", line 50, in run
  File "C:\Python27\lib\distutils\command\build_ext.py", line 337, in run
    self.build_extensions()
  File "setup.py", line 41, in build_extensions
    build_ext.build_extensions(self)
  File "C:\Python27\lib\site-packages\Pyrex\Distutils\build_ext.py", line 82, in build_extensions
    self.build_extension(ext)
  File "build\bdist.win32\egg\setuptools\command\build_ext.py", line 183, in build_extension
  File "C:\Python27\lib\distutils\command\build_ext.py", line 496, in build_extension
    depends=ext.depends)
  File "C:\Python27\lib\distutils\msvc9compiler.py", line 546, in compile
    extra_postargs)
  File "setup.py", line 70, in spawn
    os.path.dirname(find_executable("cl.exe", PATH))
  File "C:\Python27\lib\ntpath.py", line 215, in dirname
    return split(p)[0]
  File "C:\Python27\lib\ntpath.py", line 180, in split
    d, p = splitdrive(p)
  File "C:\Python27\lib\ntpath.py", line 115, in splitdrive
    if len(p) > 1:
TypeError: object of type 'NoneType' has no len()

EE ===ERROR: Failure: OSError (/home/facecnn/cudamat-master/test/libcudamat.so: cannot open shared object file: No such file or directory)

EE ====================================================================== 
ERROR: Failure: OSError (/home/facecnn/cudamat-master/test/libcudamat.so: cannot open shared object file: No such file or directory)
Traceback (most recent call last):
  File "/home/facecnn/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/facecnn/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/facecnn/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/facecnn/cudamat-master/test/test_cudamat.py", line 3, in <module>
    import cudamat as cm
  File "/home/facecnn/cudamat-master/test/cudamat.py", line 19, in <module>
    _cudamat = load_library('libcudamat')
  File "/home/facecnn/cudamat-master/test/cudamat.py", line 17, in load_library
    basename + ext))
  File "/home/facecnn/anaconda2/lib/python2.7/ctypes/__init__.py", line 440, in LoadLibrary
    return self._dlltype(name)
  File "/home/facecnn/anaconda2/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/facecnn/cudamat-master/test/libcudamat.so: cannot open shared object file: No such file or directory

======================================================================
ERROR: Failure: OSError (/home/facecnn/cudamat-master/test/libcudamat.so: cannot open shared object file: No such file or directory)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/facecnn/anaconda2/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/facecnn/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/facecnn/anaconda2/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/facecnn/cudamat-master/test/test_learn.py", line 4, in <module>
    import cudamat as cm
  File "/home/facecnn/cudamat-master/test/cudamat.py", line 19, in <module>
    _cudamat = load_library('libcudamat')
  File "/home/facecnn/cudamat-master/test/cudamat.py", line 17, in load_library
    basename + ext))
  File "/home/facecnn/anaconda2/lib/python2.7/ctypes/__init__.py", line 440, in LoadLibrary
    return self._dlltype(name)
  File "/home/facecnn/anaconda2/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/facecnn/cudamat-master/test/libcudamat.so: cannot open shared object file: No such file or directory

----------------------------------------------------------------------
Ran 2 tests in 0.031s

FAILED (errors=2)

Using "make" from deepnet bundle

Hello, I want to start using deepnet (https://github.com/nitishsrivastava/deepnet). For some reason it was built on top of cudamat, so I need to install and i read the tutorial. But I have trouble while i call 'make' file like this:

~/Documents/deepnet-master/cudamat$ sudo make
nvcc -O3
-v
-gencode=arch=compute_10,code=sm_10
-gencode=arch=compute_20,code=sm_20
-gencode=arch=compute_30,code=sm_30
--compiler-options '-fPIC' -o libcudamat.so
--shared cudamat.cu cudamat_kernels.cu -lcublas -L
nvcc fatal : argument expected after '-L'
make: *** [libcudamat.so] Error 255

why it happen?
thank you

nvcc fatal error during installation on mac (2014 MBP)

the following error was seen. it seems that setup.py was trying to compile using x86_64 architecture which isn't among the gpu-architecture. overriding setup.py with "NVCCFLAGS=-arch=sm_30 python setup.py install" didn't help.

nvcc --shared -arch x86_64 build/temp.macosx-10.5-x86_64-2.7/cudamat/learn.o build/temp.macosx-10.5-x86_64-2.7/cudamat/learn_kernels.o -L/Users/hyan/anaconda/lib -lcublas -o build/lib.macosx-10.5-x86_64-2.7/cudamat/libcudalearn.so
nvcc fatal : Value 'x86_64' is not defined for option 'gpu-architecture'

NVCC version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_19:13:24_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

nvcc fatal : Path to libdevice library not specified

Hi, I'm running a fresh ubuntu 14.04 installation with Cuda 6.5 and a GT650M. I downloaded the deb files from the official nvidia cuda site, added it via dpkg and executed 'sudo apt-get install cuda'. This installed Cuda 6.5 and nvidia-340 drivers. However, I'm running into troubles when I try to install cudamat by 'sudo python install setup.py', see the output:

running install
running bdist_egg
running egg_info
writing cudamat.egg-info/PKG-INFO
writing top-level names to cudamat.egg-info/top_level.txt
writing dependency_links to cudamat.egg-info/dependency_links.txt
reading manifest file 'cudamat.egg-info/SOURCES.txt'
writing manifest file 'cudamat.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'cudamat.libcudamat' extension
nvcc -I/usr/include/python2.7 -c cudamat/cudamat.cu -o build/temp.linux-x86_64-2.7/cudamat/cudamat.o -O --ptxas-options=-v --compiler-options '-fPIC'
nvcc fatal : Path to libdevice library not specified
error: command 'nvcc' failed with exit status 1

I'm googling and trying different things for hours now and can't find a solutions. cuda examples are running without problems. Can you help me out?

Add NVCCCompiler to setup.py

While #42 finally adds back Windows support to cudamat (after changing it to properly install via setup.py), the solution is quite dirty: We let distutil's MSVCCompiler generate build commands for what it thinks is a Python C extension, and then go at great lengths to change these commands into something that calls nvcc instead.
For Linux and Mac OS, it's actually the same: We let distutils create commands for what it thinks is a Python C extension, but tell it to use nvcc instead of the default compiler. In contrast to Windows, just switching the compiler executable to nvcc is about all that's needed, that's why it seemed like a good idea.

Things could be a lot less cumbersome, though: (a) We do not want to build a Python C extension that links against Python headers and is directly importable as a Python module, we just want to build a shared library that we can load with ctypes. (b) nvcc already incorporates a lot of the logic needed to support different platforms, i.e., it knows how to call gcc on Linux/Mac and how to call cl.exe on Windows. On all three platforms, the following is enough to build the cudamat shared library [*]:

# build an object file
nvcc -c -O -o <file>.obj <file>.cu
# link to library
nvcc --shared -o <file>.<so_ext> <file1>.obj <file2>.obj -lcublas

Where <so_ext> would be sysconfig.get_config_var('SO') (i.e., .so on Linux, .dylib on Mac, and something else on Windows, maybe .dll or .pyd).

It should be possible to write a simple CCompiler subclass called NVCCCompiler that issues exactly these commands [*]. It should work on all three platforms without any of the hacks currently present in setup.py.

*: On Windows, it would also need the PATH and find_executable("cl.exe") trick currently present in setup.py to avoid Anaconda's cl.exe taking precedence over a more recent one available on the search path. And on Linux/Mac, it would need -Xcompiler=-fPIC for the object file compilation so it can be compiled into a shared library later. But that's still a lot less effort than the current setup.py.

test_add_sums, test_pow_matrix, test_sum_trans in the test_cudamat.py module sometimes randomly fail

I am testing this on Windows 10, Cuda 6.5, GTX 970, i4690k. Most of the time the test finish perfectly in 0.6 seconds on my machine, but 10-20% of the time these three units and only these three will fail with a 10-20% probability, sometimes two of them during the same run. I am unsure of how to dig deeper into this.

Also the tests only work for me if I run them through nose. Without it, I get the error that CUDAMatrix does not have the field 'ones.'

how to install lib?

how to install lib?

it will be good if I can install it using pip like "pip install cudamat"

on win xp x32 I run make in folder and then run python test_cudamat.py

but it gives me error

Traceback (most recent call last):
File "test_cudamat.py", line 4, in
import cudamat as cm
File "C:\Documents and Settings\User\╨рсюўшщ ёЄюы\cudamat-master\cudamat-mas
ter\cudamat.py", line 8, in
cudamat = ct.cdll.LoadLibrary('libcudamat.dll')
File "C:\Python27\lib\ctypes__init
_.py", line 443, in LoadLibrary
return self.dlltype(name)
File "C:\Python27\lib\ctypes__init
_.py", line 365, in init
self._handle = _dlopen(self._name, mode)
WindowsError: [Error 126]

it seems that dll is missing?

I have only .so and .lib files.

if I rename .so to .dll in makefile I get error

Traceback (most recent call last):
File "test_cudamat.py", line 4, in
import cudamat as cm
File "C:\Documents and Settings\User\╨рсюўшщ ёЄюы\cudamat-master\cudamat-mas
ter\cudamat.py", line 12, in
cudamat.get_last_cuda_error.restype = ct.c_char_p
File "C:\Python27\lib\ctypes__init
_.py", line 378, in getattr
func = self.getitem(name)
File "C:\Python27\lib\ctypes__init__.py", line 383, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: function 'get_last_cuda_error' not found

Error while calling test_cudamat.py

I have error while call test_cudamat.py. But when I call import cudamat in python, it's all fine.

Here is the detail:

python test_cudamat.py
Traceback (most recent call last):
File "test_cudamat.py", line 4, in
import cudamat as cm
File "/home/aries/Documents/deepnet-master/cudamat/cudamat.py", line 10, in
_cudamat = ct.cdll.LoadLibrary('libcudamat.so')
File "/usr/lib/python2.7/ctypes/init.py", line 443, in LoadLibrary
return self._dlltype(name)
File "/usr/lib/python2.7/ctypes/init.py", line 365, in init
self._handle = _dlopen(self._name, mode)
OSError: libcudamat.so: cannot open shared object file: No such file or directory

Windows install error

Hi, I am currently trying to install cudamat on windows 8.

I have installed microsoft visual c++ compiler for python 2.7 as well as the nvidia CUDA toolkit.
I downloaded the cudamat-master folder from git into my anaconda/lib/site-packages folder then ran python setup --verbose install, and got the following error:

running install
running bdist_egg
running egg_info
writing cudamat.egg-info\PKG-INFO
writing top-level names to cudamat.egg-info\top_level.txt
writing dependency_links to cudamat.egg-info\dependency_links.txt
reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
not copying cudamat\cudamat.py (output up-to-date)
not copying cudamat\learn.py (output up-to-date)
not copying cudamat\__init__.py (output up-to-date)
not copying cudamat\cudamat.cu (output up-to-date)
not copying cudamat\cudamat_kernels.cu (output up-to-date)
not copying cudamat\learn.cu (output up-to-date)
not copying cudamat\learn_kernels.cu (output up-to-date)
not copying cudamat\rnd_multipliers_32bit.txt (output up-to-date)
running build_ext
building 'cudamat.libcudamat' extension
Calling 'vcvarsall.bat amd64' (version=9.0)
error: [Error 2] The system cannot find the file specified

Install Error on Windows

Hi,
I'm trying to learn cudamat and getting install error on Windows. Any hints where to start with this?
I have an old version of Cuda running on my machine which works well with Jacket/Matlab but i'm trying to transition to Python. The error below is pretty unclear on install so hoping someone knows the next step. Thanks!

c:\ProgramData\Anaconda3>python -m pip install c:\users\Nate\cudamat
Processing c:\users\nate\cudamat
Installing collected packages: cudamat
Running setup.py install for cudamat ... error
Complete output from command c:\ProgramData\Anaconda3\python.exe -u -c "impo
rt setuptools, tokenize;file='C:\Users\Nate\AppData\Local\Temp\pip-q62
0_npk-build\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read()
.replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install
--record C:\Users\Nate\AppData\Local\Temp\pip-nc21r8pg-record\install-record.txt
--single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\cudamat
copying cudamat\cudamat.py -> build\lib.win-amd64-3.6\cudamat
copying cudamat\learn.py -> build\lib.win-amd64-3.6\cudamat
copying cudamat_init_.py -> build\lib.win-amd64-3.6\cudamat
running egg_info
creating cudamat.egg-info
writing cudamat.egg-info\PKG-INFO
writing dependency_links to cudamat.egg-info\dependency_links.txt
writing top-level names to cudamat.egg-info\top_level.txt
writing manifest file 'cudamat.egg-info\SOURCES.txt'
warning: manifest_maker: standard file '-c' not found

reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
copying cudamat\cudamat.cu -> build\lib.win-amd64-3.6\cudamat
copying cudamat\cudamat_kernels.cu -> build\lib.win-amd64-3.6\cudamat
copying cudamat\learn.cu -> build\lib.win-amd64-3.6\cudamat
copying cudamat\learn_kernels.cu -> build\lib.win-amd64-3.6\cudamat
copying cudamat\rnd_multipliers_32bit.txt -> build\lib.win-amd64-3.6\cudamat

running build_ext
building 'cudamat.libcudamat' extension
creating build\temp.win-amd64-3.6
creating build\temp.win-amd64-3.6\Release
creating build\temp.win-amd64-3.6\Release\cudamat
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Nate\AppData\Local\Temp\pip-q620_npk-build\setup.py", line

121, in
cmdclass={'build_ext': CUDA_build_ext})
File "c:\ProgramData\Anaconda3\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "c:\ProgramData\Anaconda3\lib\distutils\dist.py", line 955, in run_co
mmands
self.run_command(cmd)
File "c:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_co
mmand
cmd_obj.run()
File "c:\ProgramData\Anaconda3\lib\site-packages\setuptools-27.2.0-py3.6.e
gg\setuptools\command\install.py", line 61, in run
File "c:\ProgramData\Anaconda3\lib\distutils\command\install.py", line 545
, in run
self.run_command('build')
File "c:\ProgramData\Anaconda3\lib\distutils\cmd.py", line 313, in run_com
mand
self.distribution.run_command(command)
File "c:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_co
mmand
cmd_obj.run()
File "c:\ProgramData\Anaconda3\lib\distutils\command\build.py", line 135,
in run
self.run_command(cmd_name)
File "c:\ProgramData\Anaconda3\lib\distutils\cmd.py", line 313, in run_com
mand
self.distribution.run_command(command)
File "c:\ProgramData\Anaconda3\lib\distutils\dist.py", line 974, in run_co
mmand
cmd_obj.run()
File "c:\ProgramData\Anaconda3\lib\site-packages\setuptools-27.2.0-py3.6.e
gg\setuptools\command\build_ext.py", line 77, in run
File "c:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_buil
d_ext.py", line 185, in run
_build_ext.build_ext.run(self)
File "c:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 3
39, in run
self.build_extensions()
File "C:\Users\Nate\AppData\Local\Temp\pip-q620_npk-build\setup.py", line
41, in build_extensions
build_ext.build_extensions(self)
File "c:\ProgramData\Anaconda3\lib\site-packages\Cython\Distutils\old_buil
d_ext.py", line 193, in build_extensions
self.build_extension(ext)
File "c:\ProgramData\Anaconda3\lib\site-packages\setuptools-27.2.0-py3.6.e
gg\setuptools\command\build_ext.py", line 198, in build_extension
File "c:\ProgramData\Anaconda3\lib\distutils\command\build_ext.py", line 5
33, in build_extension
depends=ext.depends)
File "c:\ProgramData\Anaconda3\lib\distutils_msvccompiler.py", line 382,
in compile
self.spawn(args)
File "C:\Users\Nate\AppData\Local\Temp\pip-q620_npk-build\setup.py", line
70, in spawn
os.path.dirname(find_executable("cl.exe", PATH))
File "c:\ProgramData\Anaconda3\lib\ntpath.py", line 242, in dirname
return split(p)[0]
File "c:\ProgramData\Anaconda3\lib\ntpath.py", line 204, in split
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

----------------------------------------

Command "c:\ProgramData\Anaconda3\python.exe -u -c "import setuptools, tokenize;
file='C:\Users\Nate\AppData\Local\Temp\pip-q620_npk-build\setup.py';f
=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f
.close();exec(compile(code, file, 'exec'))" install --record C:\Users\Nate\A
ppData\Local\Temp\pip-nc21r8pg-record\install-record.txt --single-version-extern
ally-managed --compile" failed with error code 1 in C:\Users\Nate\AppData\Local
Temp\pip-q620_npk-build\

Compile for higher GPU architectures

Changing the Makefile to match the compute capabilities of the device at hands can provide performance benefits. For example, on a GTX 580 (compute capability 2.0), adding -arch=sm_20 to the nvcc calls consistently improves performance of element-wise addition and multiplication by 8% for small matrices (both with and without the dynamic blocksize feature). The current Makefile always compiles for compute capabilities 1.0.

We should consider adding a configure script to detect (or manually select) all compute capabilities to compile for, or even switch to a build system for the package.

Cannot import cudamat on OS X

I installed cudamat on OS X (via pip).
The package is in /usr/local/lib/python2.7/site-packages/cudamat (since python is installed via homebrew)
However, it complains when I import cudamat.

It seems that dlopen is not able to find libcudamat.so

Error Message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cudamat/__init__.py", line 1, in <module>
    from .cudamat import *
  File "cudamat/cudamat.py", line 13, in <module>
    os.path.dirname(__file__) or os.path.curdir, 'libcudamat.so'))
  File "/usr/local/Cellar/python/2.7.7_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ctypes/__init__.py", line 443, in LoadLibrary
    return self._dlltype(name)
  File "/usr/local/Cellar/python/2.7.7_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ctypes/__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(cudamat/libcudamat.so, 6): image not found

cudamat cannot initialize

import cudamat as cm
cm.cublas_init()
Traceback (most recent call last):
File "", line 1, in
File "cudamat.py", line 1561, in cublas_init
raise CUDAMatException('error initializing CUBLAS: (err=%u)' % err)

cudamat.CUDAMatException: error initializing CUBLAS: (err=-2)

Any ideas?

On "minmax" method in class CUDAMatrix

Hi, all

Is there the method "minmax" in class CUDAMatrix for some version of cudamat?

I am trying to run ctc-stanford scripts, and then minmax method is called for cuda.cuda.CUDAMatrix objects. For example, self.hActsFor.minmax(0.0,self.maxAct,col=0), where hActsFor is such kind of object.

install error on Windows

(C:\Anaconda3) C:\Anaconda3\cudamat-master>python setup.py install
running install
running bdist_egg
running egg_info
writing top-level names to cudamat.egg-info\top_level.txt
writing cudamat.egg-info\PKG-INFO
writing dependency_links to cudamat.egg-info\dependency_links.txt
reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
building 'cudamat.libcudamat' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools
But I have installed VC,how to solve it?

Fail to build on Windows 64 bit

I currently have Python 2.7 installed and just purchased a GTX 980 (after having a 7970) to be able to use cudamat to speed up my theano neural nets.

When I run "python setup.py install" I get this error:

$ python setup.py install
running install
running bdist_egg
running egg_info
writing cudamat.egg-info\PKG-INFO
writing top-level names to cudamat.egg-info\top_level.txt
writing dependency_links to cudamat.egg-info\dependency_links.txt
reading manifest file 'cudamat.egg-info\SOURCES.txt'
writing manifest file 'cudamat.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
creating build\lib.win-amd64-2.7
creating build\lib.win-amd64-2.7\cudamat
copying cudamat\cudamat.py -> build\lib.win-amd64-2.7\cudamat
copying cudamat\learn.py -> build\lib.win-amd64-2.7\cudamat
copying cudamat__init__.py -> build\lib.win-amd64-2.7\cudamat
copying cudamat\cudamat.cu -> build\lib.win-amd64-2.7\cudamat
copying cudamat\cudamat_kernels.cu -> build\lib.win-amd64-2.7\cudamat
copying cudamat\learn.cu -> build\lib.win-amd64-2.7\cudamat
copying cudamat\learn_kernels.cu -> build\lib.win-amd64-2.7\cudamat
copying cudamat\rnd_multipliers_32bit.txt -> build\lib.win-amd64-2.7\cudamat
running build_ext
building 'cudamat.libcudamat' extension
error: Don't know how to compile cudamat/cudamat.cu to build\temp.win-amd64-> 2.7\Release\cudamat/cudamat.obj

Alternatively, running "pip install" gives me:

$ pip install
←[33mYou must give at least one requirement to install (see "pip help install")←[0m

What is going wrong?

CUDAMatrix has no attribute ones

in check_ones_matrix(min_size)
   1181 
   1182 def check_ones_matrix(min_size):
-> 1183     if min_size > CUDAMatrix.ones.shape[0]:
   1184         raise CUDAMatException(
   1185             'Not enough memory allocated for reduction. '
AttributeError: type object 'CUDAMatrix' has no attribute 'ones'

I tried the example from the documentation and A.sum(axis=0, target=col_sums) gave the error above.

Support for ND arrays (or at least 3D) and data types other than 'float'?

Hi there,

I realize cudamat was built predominantly to support floating-point linear-algebra operations but I've been wondering whether it would be feasible to extend it to arrays of different data types and higher dimensionality.

After having taken a quick look at the kernels in 'cudamat_kernels.cuh' I would think that a generalization wouldn't be so outlandish given that (recent versions of) CUDA support kernel templating (at least to base types).

In addition, as the arrays you're using are always flat 1D arrays I don't understand why CUDAMatrix has been restricted to 2D representations (or at least why no CUDAArray class) exists.

Are any such extensions planned for future? Such extensions would allow for image processing algorithms with short int values (thus saving memory) and which can either be vectorial (RGB components), multi-image datasets such as medical image data, or both.

Regardless, thank you for making cudamat available and keep up the good work!

Division by Zero during compilation

During installation I get these warnings:
cudamat_kernels.cu(771): warning: division by zero
cudamat_kernels.cu(747): warning: division by zero

Which correspond to the following line:

target[targetRowI * nCols + colI] = sourceRowI==-1 ? (1.0/0.0 -1.0/0.0) : source[sourceRowI * nCols + colI];

Is that a mistake or some trick I don't know?

how to do subtract_row_vec or subtract_col_vec in cudamat

Hi, I have some issues in using cudamat for developing. I found that there is no function
performing subtract_row_vec and subtract_col_vec, while add_row_vec and add_col_vec do exist.
It would be greatly appreciated if some hints are provided. Thanks a lot.

undefined symbol: __gxx_personality_v0 error when running tests

python2 test_cudamat.py 
Traceback (most recent call last):
  File "test_cudamat.py", line 3, in <module>
    import cudamat as cm
  File "/home/eg7/.local/lib/python2.7/site-packages/cudamat-0.3-py2.7-linux-x86_64.egg/cudamat/__init__.py", line 2, in <module>
    from . import learn
  File "/home/eg7/.local/lib/python2.7/site-packages/cudamat-0.3-py2.7-linux-x86_64.egg/cudamat/learn.py", line 8, in <module>
    _cudalearn = load_library('libcudalearn')
  File "/home/eg7/.local/lib/python2.7/site-packages/cudamat-0.3-py2.7-linux-x86_64.egg/cudamat/cudamat.py", line 17, in load_library
    basename + ext))
  File "/usr/lib/python2.7/ctypes/__init__.py", line 440, in LoadLibrary
    return self._dlltype(name)
  File "/usr/lib/python2.7/ctypes/__init__.py", line 362, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/eg7/.local/lib/python2.7/site-packages/cudamat-0.3-py2.7-linux-x86_64.egg/cudamat/libcudalearn.so: undefined symbol: __gxx_personality_v0

I have to note that python uses gcc 6.2.0 while cudamat was compiled using gcc-5

Any particular reason cudamat isn't on PyPI?

Quite possibly the most trivial issue ever to be opened on github, but is there any particular reason cudamat isn't on PyPI? It seems like an excellent package and would be a great addition to the index.

nose reports 0 tests run

I get the following:

alexs-mbp:cudamat alex$ ipython test_cudamat.py 

----------------------------------------------------------------------
Ran 0 tests in 0.000s

OK

I am able to fix this by changing the nose line to: nose.runmodule(argv=[__file__,]:

alexs-mbp:cudamat alex$ python test_cudamat.py 
........................................................
----------------------------------------------------------------------
Ran 56 tests in 1.099s

OK

float pointer used for indices

Looking at the code in cudamat_kernels.cu, I see that a float pointer is used to hold array of indices, for example line 727 has:
__global__ void kSelectRows(float* source, float* target, float* indices, int nRowIs, int nCols, int nSourceRows)

However, limited single precision accuracy causes a wrong index access, for example the value of int(float(20000001)) is 20000000. Having 20 million elements in an array is not so rare.

Is there any way to work around this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.