Greetings, Great start you've made here. I see thi

nice progress <a class="user-mention notranslate" data-hovercard-type="user" data-hove

question about CUDA Plan about komputation HOT 15 CLOSED

austin-sandia commented on August 23, 2024

question about CUDA Plan

from komputation.

Comments (15)

sekwiatkowski commented on August 23, 2024

CUDA support and optimizations are being implemented on a step-by-step basis.

Currently, CublasProjectionLayer is a drop-in replacement for ProjectionLayer. The next step will be a kernel-based implementation of the sigmoid activation function.

Following that, I will introduce the idea of instructions (modelled as data classes) as a unified way to specify the layers in a neural networks. A CPU interpreter and a CUDA interpreter will process the instructions to create architecture-specific networks. CUDA-based networks will minimize the data transfer between the CPU host and the GPU device.

You can find the changelog here: https://medium.com/@komputation

Starting with v0.7.1, it includes some notes regarding the subject of host/device communication.

from komputation.

zjuhasz commented on August 23, 2024

Are there any plans for an OpenCL interpreter along these same lines as the CUDA interpreter?

from komputation.

sekwiatkowski commented on August 23, 2024

Do you have a specific use case in mind for an OpenCL interpreter?

from komputation.

austin-sandia commented on August 23, 2024

nice progress @sekwiatkowski !

question -- does your design allow for forward and backward propagation of a batch of training examples at once? this would be faster because it would allow you do to more matrix-matrix multiplication instead of just matrix-vector. can't tell 100% yet whether or not you are doing it.

from komputation.

austin-sandia commented on August 23, 2024

Ok, i see now in Network and CudaNetwork that it is definitely doing one training example at a time . . . am i correct that other libraries generally do multiple at once?

from komputation.

sekwiatkowski commented on August 23, 2024

Parallel propagation for mini-batches will probably be added in v0.8 (some time next week).

from komputation.

austin-sandia commented on August 23, 2024

nice!

from komputation.

austin-sandia commented on August 23, 2024

so . . . floats vs. doubles . . . never an easy question in java . . . i get the sense that most people run networks with floats on the GPU. i'm sure you've thought about this to some extent . . . easier said than done.

from komputation.

sekwiatkowski commented on August 23, 2024

Doubles will probably be replaced by floats in the near future, but I have to run some benchmarks first.

from komputation.

austin-sandia commented on August 23, 2024

so what are you thinking for the batch-appropriate datatype? some sort of tensor? clearly need all of the data for a batch to be in one long array to get the speed benefits of converting 32 matrix-vector multiplies into one big matrix-matrix multiply.

also, i am curious about the cpu-convolution plan . . . I am assuming you are going to use cudnn for GPU. One thing I found that could be useful is this:
https://github.com/01org/daal/blob/daal_2018_beta_update1/examples/java/com/intel/daal/examples/neural_networks/Conv2DLayerDenseBatch.java

as far as i can tell, this java code calls super-optimized intel mkl-ish c++ code.

from komputation.

austin-sandia commented on August 23, 2024

also the obvious solution to batch is to use a matrix with every row as an observation. this runs into problems when one considers convolution though . . .

from komputation.

sekwiatkowski commented on August 23, 2024

I will start to work on parallel propagation now. I'm inclined towards using a one-dimensional array for the entire mini-batch.

from komputation.

austin-sandia commented on August 23, 2024

agreed all of the mini-batch data should be next to each other.

check out this thing, in particular the mkl. it provides java interface to sgemm and friends in mkl but does not require installation of the mkl library by the user--i.e. the mkl is bundled.

https://github.com/intel-analytics/BigDL-core/tree/162d95df3941976034691b266ae63401a580902b

<dependency> <groupId>com.intel.analytics.bigdl.native</groupId> <artifactId>mkl-java</artifactId> <version>0.1.0</version> </dependency> <dependency> <groupId>com.intel.analytics.bigdl.dnn.native</groupId> <artifactId>dnn-java</artifactId> <version>0.1.0</version> </dependency>

from komputation.

zjuhasz commented on August 23, 2024

Do you have a specific use case in mind for an OpenCL interpreter?

Just for hardware compatibility. CUDA can only be used with Nvidia GPUs correct?

from komputation.

sekwiatkowski commented on August 23, 2024

For the time being, only CUDA will be supported. This may change if AMD overtakes Nvidia performance-wise or OpenCL comes up in one of my projects. I'm also keeping an eye open for entirely different and new architectures.

from komputation.

question about CUDA Plan about komputation HOT 15 CLOSED

Comments (15)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent