Comments (5)
The reason why MPSCNN is faster than your convolution kernel is that Apple has a team of very smart people who spent all their time writing and optimizing such kernels. :-)
Note that you don't need to do 4 texture reads from the input texture in your loop, only one. See my (also slow) version of this kernel here (it's called conv3x3): https://github.com/hollance/Forge/blob/master/Forge/Forge/Shaders.metal
I know the MPSCNN kernels also don't use textures for their weights and biases but MTLBuffers, although that in itself probably wouldn't make a huge speed difference.
The biggest reason for the speed difference is most likely that MPSCNN uses a faster algorithm. There are many ways you can compute convolution (im2col, FFT, Winograd, etc). Apple has the resources to try all of them. And they also have inside knowledge of how the GPU works, something we can only guess at.
I would like to add a very fast conv kernel to Forge at some point, just to show how it can be done, but my time is limited...
from forge.
other reason is that MPSCNN is using float16
from forge.
i am very expecting for your fast conv kernel :-)
from forge.
in objc, there is no datatype like float16 , but datatype "half" is supported in metal kernel , how can i use float16 in my code ?
from forge.
i ask the question in apple forum , https://forums.developer.apple.com/message/229368
from forge.
Related Issues (20)
- Are you going to develop Deconvolution layers?) HOT 2
- EXC_BAD_ACCESS on release executions HOT 4
- How to implement element wise layer in forge HOT 2
- How do I only use 1 or 2 classes for the YOLO example? HOT 1
- implant Yolo to ARKit HOT 3
- Greate job!
- code signing blocked mmap() HOT 1
- Forge is not support ios 11.3 HOT 4
- Error: framework not found Forge for architecture arm64
- Upload to appstore error
- Update ios version and xcode version, can not run any more. HOT 2
- Custom TinyYOLO doesn't work HOT 2
- update to Xcode 9.4 HOT 3
- mhh, not running HOT 1
- Reshape Layer HOT 1
- Error: the destination image texture is temporary and has a readCount of 0. HOT 2
- Add layer HOT 2
- Regarding offset for picking values for bounding box values HOT 6
- Results of MPSCNNConvlotion HOT 6
- [question] TensorFlow Lite HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from forge.