Code Monkey home page Code Monkey logo

Comments (4)

almarklein avatar almarklein commented on May 29, 2024

In here, the x, y and z are ints, I assume? Do you mean the shape for how the shader is dispatched, as the n in here, or is there equivalent GLSL?

from pyshader.

axsaucedo avatar axsaucedo commented on May 29, 2024

@almarklein that is correct, there is another relevant component in the shader.

You probably are already aware of this, but I'll outline it for completeness:

In your "CPU" code you have to specify the "dispatch workgroup" size to run, ie the number of times to run the shader (which is the one you specified). Then in the GPU shader you have to specify the size of the thread block "layout size".

In more practical terms, if you have a buffer with 500x200 elements, you are able to define your thread block "layout size" inside the shader as:

layout (local_size_x = 5, local_size_y = 2, local_size_z = 1) in;

Which means that each iteration you will process 5*2*1 = 10 buffer elements in a thread block.

This means that then your dispatch can be something like:

obj.dispatch(100, 100, 1)

This means it will run that shader 100x100x1 times.

You can split this into different sizes to process the same dataset. For example if your layout size is (1, 1, 1) and your dispatch size is (500,100,2) then you would still end up processing all the elements.

from pyshader.

almarklein avatar almarklein commented on May 29, 2024

You probably are already aware of this, but I'll outline it for completeness:

Actually no :) TBH most of my experience with opengl was with the es2 subset. So thanks for the details! And this sounds like a useful feature indeed.

from pyshader.

CaiusTSM avatar CaiusTSM commented on May 29, 2024

Hi, I really need this feature in order to achieve any sort of passable performance. My task is implementing matrix multiplication (GEMM), and in order for the speed to be faster than my CPU (at least a couple of hundred GFLOPS) I need local layout control and shared memory. Specifically I need to be able to do something like this (glsl code, example of shared (group/local) memory):

#define SIZE 64

layout (local_size_x = SIZE, local_size_y = 1, local_size_z = 1) in;

shared float shared_data[SIZE];

...

void main() {
... Here is some computation in which each work item in the work group computes one element of the shared_data.
... Then the work group is synchronized with barrier() / memoryBarrierShared() (each work item waits for the entire group to finish filling their part of shared_data)
... Then for example, the shared_data is summed up all together and the output at index.x is set to that result.
}

(local invocation id is also required)

Having shared memory could be made it's own github issue. Being able to use shared memory for group local computation would roughly double the performance of my matrix multiplication (still a far cry from the full possible speed, but somewhat passable). The reason this is a lot faster is because local memory is much faster than global memory. Copying a chunk of the global memory to local memory first, and then computing things on that local chunk (on-chip) is much faster. The GPU's caching system can only do so much to alleviate this problem.

from pyshader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.