Comments (4)
In here, the x
, y
and z
are ints, I assume? Do you mean the shape for how the shader is dispatched, as the n
in here, or is there equivalent GLSL?
from pyshader.
@almarklein that is correct, there is another relevant component in the shader.
You probably are already aware of this, but I'll outline it for completeness:
In your "CPU" code you have to specify the "dispatch workgroup" size to run, ie the number of times to run the shader (which is the one you specified). Then in the GPU shader you have to specify the size of the thread block "layout size".
In more practical terms, if you have a buffer with 500x200 elements, you are able to define your thread block "layout size" inside the shader as:
layout (local_size_x = 5, local_size_y = 2, local_size_z = 1) in;
Which means that each iteration you will process 5*2*1 = 10
buffer elements in a thread block.
This means that then your dispatch can be something like:
obj.dispatch(100, 100, 1)
This means it will run that shader 100x100x1 times.
You can split this into different sizes to process the same dataset. For example if your layout size is (1, 1, 1) and your dispatch size is (500,100,2) then you would still end up processing all the elements.
from pyshader.
You probably are already aware of this, but I'll outline it for completeness:
Actually no :) TBH most of my experience with opengl was with the es2 subset. So thanks for the details! And this sounds like a useful feature indeed.
from pyshader.
Hi, I really need this feature in order to achieve any sort of passable performance. My task is implementing matrix multiplication (GEMM), and in order for the speed to be faster than my CPU (at least a couple of hundred GFLOPS) I need local layout control and shared memory. Specifically I need to be able to do something like this (glsl code, example of shared (group/local) memory):
#define SIZE 64
layout (local_size_x = SIZE, local_size_y = 1, local_size_z = 1) in;
shared float shared_data[SIZE];
...
void main() {
... Here is some computation in which each work item in the work group computes one element of the shared_data.
... Then the work group is synchronized with barrier() / memoryBarrierShared() (each work item waits for the entire group to finish filling their part of shared_data)
... Then for example, the shared_data is summed up all together and the output at index.x is set to that result.
}
(local invocation id is also required)
Having shared memory could be made it's own github issue. Being able to use shared memory for group local computation would roughly double the performance of my matrix multiplication (still a far cry from the full possible speed, but somewhat passable). The reason this is a lot faster is because local memory is much faster than global memory. Copying a chunk of the global memory to local memory first, and then computing things on that local chunk (on-chip) is much faster. The GPU's caching system can only do so much to alleviate this problem.
from pyshader.
Related Issues (20)
- Syntax for defining input and output HOT 10
- First release to PyPi HOT 1
- pypi package is broken because readme is not included in the package. HOT 4
- Implicit type conversions HOT 4
- Trace debug info so we can produce more useful error messages. HOT 1
- Be more consistent about exception types raised by the parsers
- Add support for runtime constants (specialization)? HOT 1
- WebGPU shading language (WGSL) HOT 5
- Option to spell co_select in Python?
- Is our use of annotations ok? HOT 6
- Support shaders with jumps in bytecode >255 bytes
- Compute example crashes in create_compute_pipeline HOT 19
- Support some form of templating of entry points? HOT 2
- Compiling to file HOT 2
- Error processing resulting SPIR-V shader in Vulkan 1.2.x (Kompute v0.4.2) HOT 2
- Document math / built-in functions HOT 5
- Support for Python 3.9
- Is this worth it? HOT 9
- Archive this repo
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyshader.