Code Monkey home page Code Monkey logo

Comments (3)

kknox avatar kknox commented on July 30, 2024

Hi @fommil
I started answering this thread/issue when I responded to your comments in Issue #15. I wanted to follow up with extra comments here.

Our BLAS (and likewise FFT) API is designed the way the way it is to unlock the complete potential of the heterogeneous platforms that OpenCL is designed for. In order to do that, our library can not make any guesses as to where the clients data resides. As such, the library is designed to allow the OpenCL client to completely manage their own data, meaning that the client controls where data lives and when it should be transferred to the host or the device. Our API's assume nothing, and we are not going to change this flexibility with the clMath API's; it has to be this way for performance.

However, I completely see value in interfaces that are easier to use. Interfaces that allow programmers to prototype functionality quicker, or just enables more developers to have access to the power of heterogeneous computing that is sitting on everybody's desktops and gaming rigs. Products like Accelereyes ArrayFire software, or your personal project that you link above have an important role to flesh out the software ecosystem. I hope to see solutions for everybody in the future, where a developer can understand and properly make tradeoff decisions between ease of use and performance that are right for their own compute needs.

from clblas.

fommil avatar fommil commented on July 30, 2024

@kknox it's not really just about being "easy to use". This is a compatibility issue: if you create a BLAS implementation, one certainly expects to be able to use the BLAS API to interface with it. What you've created is great, but it's not BLAS. It's BLAS-like.

To take a leaf from the LAPACKE (official C API to LAPACK): they offer version that allow the user to control memory allocation and an easy to use one that simply creates arrays on demand. I believe you should offer something similar: i.e. an API that matches Fortran BLAS exactly plus the one you're currently exposing. Surely there must be a way to setup the GPU device on shared library load, and close it down cleanly on exit (or abnormal exit / segfault!) .. such that the only overhead is the memory transfer. From my experiments with cuBLAS (which I believe is closer to the latter API) the memory transfer starts to be negligible (compared to the computational benefits) for arrays of 100,000 elements and more.

(I am aware and see clearly the performance advantage of leaving the arrays on the GPU memory space. That is clearly another level of optimisation that people may wish to make: but it requires significant source code change)

NOTE: I may well have fallen victim to NULL operations with the cuBLAS also. When I'm next at my desktop machine, I'm going to run tests at the same time as performance runs, to ensure that the DGEMM is actually taking place.

from clblas.

kknox avatar kknox commented on July 30, 2024

Closing old clBLAS issues for the new year

This idea of creating an API to match the BLAS API should be handled in a wrapper to the clBLAS library, or implemented in another library altogether. The issue of managing OpenCL state is complicated and was not a design goal of this project.

It is entirely possible to create a project that matches the BLAS API and completely hides the OpenCL details from the end user, and this new wrapper/library could call into clBLAS to implement and manage the OpenCL kernels.

from clblas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.