hollance / forge Goto Github PK

View Code? Open in Web Editor NEW

1.3K 66.0 172.0 186.87 MB

A neural network toolkit for Metal

License: MIT License

Swift 92.43% Objective-C 0.46% Metal 7.11%

metal deep-learning deep-neural-networks neural-network ios swift mobilenets machine-learning

forge's Introduction

Hi there 👋

I'm an audio software developer, working on my own plug-ins as well as freelance for other audio companies.

Most recently I've been employed as a machine learning engineer and as an iOS developer. These days I mostly write C++ audio code. I've been professionally active as a software developer since the mid 1990s.

I have also written and co-written a number of books (most notably The iOS Apprentice and Core ML Survival Guide) and have published my own apps and games as an indie developer.

Creating Synthesizer Plug-Ins with C++ and JUCE — my latest book
My audio programming blog
My machine learning blog

forge's People

Stargazers

Watchers

Forkers

tlewisii jstart wanjinchang benjamesbabala yak0xff chagge alvincrisuy ifelsego adamnemecek ofirbb alphabikram ozgurshn johndpope dacson nicolewang nandotwang craigomac joeferrucci darkerk bujiandi fisher158163 mtxs007 yaoq justinjing grigoryshushakov zententacles liam-i zmoon111 dreadlord1984 rae89 3a4ot ferasos caicai0525 baiyancheng20 luckymore0520 pythagoraskitty cityleaf muharremokutan cdwat faisal-w ahuang1900 alldev0825 soledad89 tombao2007 tony32769 shiuh-yaw kenji-go-go-go ricardopereira-codeshelf taggerone syx528911137 521314 luos9 allonli sfellner bibiteix whycoding126 marisawilsonqa rbrovko connyhakansson harrychav tognos jgabriellima rayliu2015 rrawther boosting sunatthegilddotcom strategist922 fangaohz rosssong geokal kwccoin bad-present guokr1991 walter1218 eweill vancentvan lwqbrell mrlzla image-amazing vade yidian7 vistarsoft wulio edwardburgin prayog04 shasthojoy tonychouzju lgyhero xhqglorry11 lilohuang devssh rtejo-urp undercontroller andrewzhucc fabian7593 waitingkuo pymia tanglaoya321 leispeed aaron-szt

forge's Issues

the performance is Unexpectedly

today i test Tensorflow(TF) iOS example with my iPhone 6S , according to the introduction in TF Website and source code , i know it use Apple's Accelerate framework , i build the protobuf , and TF's source code in my Mac , then run iOS example , i record the time with the code

tensorflow::Status run_status = tf_session->Run(
        {{input_layer_name, image_tensor}}, {output_layer_name}, {}, &outputs);

and the time is fast, only 90ms, i know TF's iOS example use the Google Inception V1 Model , and i test Apple's example which use Google Inception V3 Model , the time is 120ms, metal is more slow than Accelerate framework ? i can not understand . i do not think there is too much different feature that affect performance between inception V1 and V3... so how to explain it ?

Regarding offset for picking values for bounding box values

Hello,

I am trying to work on YOLO in Windows ML. Initially I converted the darknet Yolo v2 tiny model to keras using yad2k script. And used keras2onnx converter to convert from keras to ONNX.

So, model is successfully converted to ONNX with output shape as NHWC ( 13 x 13 x 125 ). Now I have to generate bounding boxes for which I tried referring your code for OFFSET but I see "Array Index: Out of bound exception". I think this is because you have 128 channels in swift and in windows ML its just 125.

So, how can I handle this?

Could you please help me on this?

Use more sensible defaults

There are a few places where I think adding some defaults in constructors would be beneficial/sensible.

Stuff like inflightBuffers and kernel in Convolution and Pooling layers could have defaults that would reduce repetition and clean up model construction.

On the flip side, perhaps some people might not notice the default params and it could lead to errors.

Thoughts?

Add layer

Is there some way to create Add layer that takes 2 or more tensors as an inputs and returns sum of them? (F.e. Add layer in keras)

Error: framework not found Forge for architecture arm64

I need to embed Forge framework inside our static library. And the static library builds successfully for all architectures: arm64, armv7, armv7s.
Xcode gives following error when static library is used in a sample app: framework not found Forge for architecture arm64. Linker command failed with exit code 1 (use -v to see invocation)

Are you going to develop Deconvolution layers?)

Hi! Amazing framework! Thanks! The framework would be even more amazing if you provide convolution transpose layer ;))) Are you going to do this? :)

update to Xcode 9.4

Hi,There are two problems come up when update to Xcode 9.4.
Layers.swift has one error :Type of expression is ambiguous without more context

conv = MPSCNNConvolution(device: device,
                             convolutionDescriptor: desc,
                             kernelWeights: weights.pointer,
                             biasTerms: biases?.pointer,
                             flags: .none)

and in file LayerHelpers.swift has the same error in the following line.

  let layer = MPSCNNConvolution(device: device,
                                convolutionDescriptor: desc,
                                kernelWeights: weightsData.pointer,
                                biasTerms: biasData?.pointer,
                                flags: .none)

Unable to archive the app with Forge.

Hey!

Awesome framework, really like using it. But here's a big issue. I'm unable to archive the app because it throws a whole bunch of unresolved error issues. The app works great during development but I'm unable to archive the app. :)

Attaching a pic for you.

Do let me know how to fix this.

Thanks!

Getting all channels for point

Hi!

Quick question. For the post-processing, is there a version of printChannelsForPixel() that just returns all the channels at the point for an MPSImage instead of printing them? Or to use the output MPSImage do I need to just make wrapper functions for slicing and indexing?

Thank you so much!

Results of MPSCNNConvlotion

Hi, I have a following float array as an input buffer for MPSImage

let buffer4c = [
// R    G     B   A    R    G     B     A
1.0, 0.0, 0.0, 1.0,  1.0, 0.0, 0.0, 1.0, 
// R    G     B   A    R    G     B     A
1.0, 0.0, 0.0, 1.0,  1.0, 0.0, 0.0, 1.0, 
]

From my understanding, this should represent a 2x2x3 tensor whose the 4th channel is padded as 1.0. Then I created a MPSImage object using that buffer via the category method defined in MPSImage+Floats.swift

inputImg  = MPSImage(device: device,
                                             numberOfImages: 1,
                                             width: 2,
                                             height: 2,
                                             featureChannels: 3,
                                             array: &buffer4c,
                                             count: 2*2*4)

After that, I created a weight buffer whose dimension is 1x3x2x2(NCHW). I understand this needs to be converted to NHWC. To make things easier, I set all values in the buffer to 1.0

nums = [1.0,1.0,1.0, 1.0,1.0,1.0,
                1.0,1.0,1.0, 1.0,1.0,1.0]

The last step is to setup the convolution, here is what I did

class Conv2d : NeuralNetwork {
    typealias PredictionType = Float16
    
    var inputImg: MPSImage!
    var outputImg: MPSImage!
    var oid = MPSImageDescriptor(channelFormat: .float16, width: 1, height: 1, featureChannels: 1)
    var conv2d: MPSCNNConvolution
    
    init(device: MTLDevice, inflightBuffers: Int) {
        weightsLoader   = { name, count in ParameterLoaderBundle(name: name, count: count, suffix: "_W", ext: "txt") }
        outputImg       = MPSImage(device: device, imageDescriptor: oid)
        conv2d          = convolution(device: device, kernel: (2, 2), inChannels: 3, outChannels: 1, activation: nil, name: "conv", useBias: false)
    }
    
    func encode(commandBuffer: MTLCommandBuffer, texture: MTLTexture, inflightIndex: Int) {
        conv2d.encode(commandBuffer: commandBuffer, sourceImage: inputImg, destinationImage: outputImg)
    }
    func fetchResult(inflightIndex: Int) -> NeuralNetworkResult<Float16> {
        let probabilities = outputImg.toFloatArray()
        print(probabilities)
        return NeuralNetworkResult<Float16>()
    }
}

From my understanding, the result of the convolution should be 4.0 （I aslo verified using pytorch）. However, the output was 1.0. I experimented a little bit, seems like only the first 4 elements of image buffer get multiplied with the corresponding weights.

Is there anything that I'm missing here?

App Store?

Any thoughts on submitting any of the demos as to the app store? I think offering a ready to go mobile version of inception-v3 / yolo / mobilenets would be awesome.

Has Mobilenet-SSD been supported on iOS yet ?

Hi Hollance
I followed the tutorial https://github.com/chuanqi305/MobileNet-SSD
After that I try to convert my deployed model to CoreML and I got the issue
[libprotobuf ERROR /Users/sohaibqureshi/github/coremltools/deps/protobuf/src/google/protobuf/text_format.cc:287] Error parsing text-format caffe.NetParameter: 1177:17: Message type "caffe.LayerParameter" has no field named "permute_param". Traceback (most recent call last): File "mobilenet_2_coreml.py", line 23, in <module> class_labels='caffe_model/synset_words.txt') File "/Users/ln160c/Downloads/YOLO-CoreML-MPSNNGraph-master/Convert/coreml/coreml/lib/python2.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 171, in convert predicted_feature_name) File "/Users/ln160c/Downloads/YOLO-CoreML-MPSNNGraph-master/Convert/coreml/coreml/lib/python2.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 230, in _export predicted_feature_name) RuntimeError: Unable to load caffe network Prototxt file: caffe_model/MobileNetSSD_deploy.prototxt

I'm not sure if CoreML supports MobileNetSSD or not. Could you take a look ?

A issues about memory leak

i add a viewcontroller which as the first vc in MobileNetsDemo.Then present the cameracontroller in demo ,and then dismiss the cameracontroller .But i find that when i do this ,there is about 10M memory is not released.And everytime i present the cameracontroller ,there is more 10M in memory which is not released. i guess the issues is everytime createNeuralNetwork the memory will in increase，but i cannot solve the problem.How could i solve this problem??

Tip "No such module 'Forge'"?

open and run the YOLO demo project, Error:No such module 'Forge

how to fix it?

mhh, not running

Hi, i can compile (message success) but the App are not running in simulator or iPhone. After build success message... nothing.

Building own app causes Libmobilegestalt issue

Hi,

I had to make this a new issue since it's unrelated to the previous one. I've built my app using Forge's DSL and converted the collective weights into layer by layer weights. The app builds correctly but when I try to run it, I encounter the following issue:

libMobileGestalt MobileGestaltSupport.m:153: pid 10398 (Labels) does not have sandbox access for frZQaeyWLUvLjeuEK43hmg and IS NOT appropriately entitled

I've tried to track down the bug, but I can't seem to locate it. My development environment is xcode 8.3 and iOS 10.3. Any pointers would be appreciated in this direction

Forge slows down after few hundred frames

Hello,

I am building app that requires real time performance. I've run few tests on Inceptionv3 example and here are results:
First run:
https://pastebin.com/G3ErxAcA
Second run:
https://pastebin.com/Se0zHifF

For the first ~300 frames GPU execution time starts with 0.07, but later increases to ~0.085.
I thought that possible cause is GPU overheating, but right after first run I tried running app again and on the second run results are similar: first few hundreds frames are processed much faster than last ones.

I see that it also depends on fps:
https://pastebin.com/cmFkFP5P
I use iPad Pro for testing and I set fps to 15. It runs smoothly for few 300-400 frames and then later slows down a lot (even 0.44 time for one or two frames) and than runs faster ~ 0.11 s, but still slower than at the beginning.
This experiment is repeatable: it is always really fast -> two frames super slow -> slower than at the beginning.

What causes this? Maybe some problem with resource management in Forge?

In my app execution time increases from 0.12 s to 0.2s per one execution, which makes my app unusable.

Thanks for help in advance :)

Use Forge with deployment target < 10.3

Hi,

I'm trying to add Forge to my app (deployment target = 9.0).

I get this compiling error :

YOLO.swift:2:8: Module file's minimum deployment target is ios10.3 v10.3: Forge.framework/Modules/Forge.swiftmodule/arm64.swiftmodule

I've tried to set the Framework as Optional and to do the last part of this page but I still get the same error.

I can't change the target of my app.

Any help would be appreciated :)

How do I only use 1 or 2 classes for the YOLO example?

Hi I love the framework. How do I make it so yolo won't use every class, for instance so it will work on just humans?

EXC_BAD_ACCESS on release executions

I tried to run this project using the release version and they all crash at:

mpscnn = MPSCNNFullyConnected(device: device, convolutionDescriptor: desc, kernelWeights: weights.pointer, biasTerms: biasTerms, flags: .none)

In createCompute of layers.swift

Only MNIST project, which does not use Forge, seems to work.

This means that an app using this library cannot be compiled to be sent to the app store, ad hoc or enterprise distribution.

Quantized model from Tensorflow

Did Forge support 8-bit float value for quantized model from tensorflow?

Hello World example and extreme example

I understand that MNST is the hello world usually (at least for my Keras learning experience). But sometimes you want to go down a bit like a simple neutral network so to check weights, understand the flow etc. Could that be a simple network for learning purpose from modeling, learning to running.

For your kind advice.

[question] TensorFlow Lite

Hi, @hollance san,

Have you ever evaluate TensorFlow Lite on iOS/iPadOS for GPU(I mean not for Neural Engine)?

As you had pointed out at some article, Core ML is slower than MPSCNN.
So I expected Metal delegate of TensorFlow Lite and tried but am disappointed it’s performance.
If you have some insights, could you tell me some?

Thanks.

Error: the destination image texture is temporary and has a readCount of 0.

when i combine mobilenet and shortcut connection, i get such an error:

/BuildRoot/Library/Caches/com.apple.xbs/Sources/MetalImage/MetalImage-100.6/MPSNeuralNetwork/Filters/MPSCNNKernel.mm:729: failed assertion `[MPSCNNConvolution encodeToCommandBuffer:sourceImage:inState:destinationImage:] Error: the destination image texture is temporary and has a readCount of 0.
Its texel storage is probably in use for another texture now.

net summary is correct but error occurs in method Model.encode(exactly, in MPSCNNLayer.encode), and i can not figure out why.net definetion is something like this:

    let relu = MPSCNNNeuronReLU(device: device, a : 0.0)
    let input = Input(width: 256, height: 512, channels:3)
    let mbv1_conv_1 = input
        --> Resize(width: 256, height: 512)
        --> Convolution(kernel: (3, 3), channels: 16, stride: (2, 2), padding: .same, activation: relu, useBias: true, name: "0")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "1_d")
        --> PointwiseConvolution(channels: 32, stride: (1, 1), activation: relu, useBias: true, name: "1_p")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (2, 2), activation: nil, useBias: false, name: "2_d")
        --> PointwiseConvolution(channels: 64, stride: (1, 1), activation: relu, useBias: true, name: "2_p")
    
    let mbv1_conv_2 = mbv1_conv_1
        --> DepthwiseConvolution(kernel: (3, 3), stride: (2, 2), activation: nil, useBias: false, name: "3_d")
        --> PointwiseConvolution(channels: 128, stride: (1, 1), activation: relu, useBias: true, name: "3_p")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "4_d")
        --> PointwiseConvolution(channels: 128, stride: (1, 1), activation: relu, useBias: true, name: "4_p")
    
    let mbv1_conv_3 = mbv1_conv_2
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "5_d")
        --> PointwiseConvolution(channels: 256, stride: (1, 1), activation: relu, useBias: true, name: "5_p")
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "6_d")
        --> PointwiseConvolution(channels: 256, stride: (1, 1), activation: relu, useBias: true, name: "6_p")
    
    let mbv1_maxpool = mbv1_conv_1
        --> MaxPooling(kernel: (2, 2), stride: (2, 2), padding: .valid)
    
    let concat = Concatenate([ mbv1_maxpool, mbv1_conv_2, mbv1_conv_3])
    
    let mbv1_conv4 = concat
        --> DepthwiseConvolution(kernel: (3, 3), stride: (1, 1), activation: nil, useBias: false, name: "7_d")
        --> PointwiseConvolution(channels: 32, stride: (1, 1), activation: relu, useBias: true, name: "7_p")

`[MPSTemporaryImage prefetchStorageWithCommandBuffer:imageDescriptorList:] Error: the descriptor must be configured with MTLStorageModePrivate'

Hi.
I'm trying to run the YOLO application of this project.
When I try it, this error occurs.

failed assertion `[MPSTemporaryImage prefetchStorageWithCommandBuffer:imageDescriptorList:] Error: the descriptor must be configured with MTLStorageModePrivate'

in YOLO.swift line 69, which calls models.swift line 306.

I edited DataShape.swift line 52,
from

return MPSImageDescriptor(channelFormat: .float16, width: width,
                              height: height, featureChannels: channels)

return MPSImageDescriptor(channelFormat: .float16, width: width,
                              height: height, featureChannels: channels,
                              storageMode: .private) // and MTLStorageMode.private instead of .private

but it doesn't work announcing "Expression type 'MPSImageDescriptor' is ambiguous without more context".
I'm working on Xcode 9 and iOS 11. (is it the reason?)
What can I do for it?
Thank you.

ETC_BAD_ACCESS code = 10

Hi!

I wrote my own CNN in the YOLO demo (currently trying to replicate squeezeDet). I changed the model layers in the init and have my weights converted as .bin files in the parameters folder. The only other item changed was labels. When I run I get "ETC_BAD_ACCESS code = 10" on the flags line of
"conv = MPSCNNConvolution(device: device,
convolutionDescriptor: desc,
kernelWeights: weights.pointer,
biasTerms: biases?.pointer,
flags: .none)"

Photo included below. What could be the issue causing this? Any advice on solving?

Thank you so much!

Edit: My problem is from this line
let output = fire11Result --> Convolution(kernel: (3, 3), channels: 72, stride: (1,1), activation: nil, name: "conv12") //error is caused by this line

The app runs using fire11Result, the second to last layer, as the output.

Any idea?

Thank you!

Upload to appstore error

Hey! I loved your framework, but I get this error when I tried to upload in appstore:
"Unexpected CFBundleExecutable Key - The bundle at '/Payload/MyApp.app/Forge/Forge/Info.plist' does not contain a bundle executable. If this bundle intentionally does not contain an executable, consider removing the CFBundleExecutable key from its Info.plist and using a CFBundlePackageType of BNDL. If this bundle is part of a third-party framework, consider contacting the developer of the framework for an update to address this issue."
I tried a lot of things in the past days, but I was unable to fix this error. Do you have any idea what should I do? Thank you very much for your help!

No such module MetalPerformanceShaders

I have this error issue. Can you have a look? thanks

How to implement element wise layer in forge

Hi, I want to convert a caffemodel which includes the elementwise(sum operation) layer, however, there is no implementation in forge, so I want to write it by myself. How to implement this layer as soon as possible? Please help me, thanks!

code signing blocked mmap()

I get the following error message when I'm running the app on my iphone:

dyld: Library not loaded: @rpath/Forge.framework/Forge
  Referenced from: /var/containers/Bundle/Application/xxxxxxxxxxxxxxx/Inception.app/Inception
  Reason: no suitable image found.  Did find:
	/private/var/containers/Bundle/Application/xxxxxxxxxxxxxxx/Inception.app/Frameworks/Forge.framework/Forge: code signing blocked mmap() of '/private/var/containers/Bundle/Application/xxxxxxxxxxxxxx/Inception.app/Frameworks/Forge.framework/Forge'
(lldb)

also a warning:

CodeSign /Users/adamszendrei/Library/Developer/Xcode/DerivedData/Forge-gikysnjirgrtmefxffiekblqjkzy/Build/Products/Debug-iphoneos/Inception.app/Frameworks/Forge.framework
    cd /Users/adamszendrei/ObjDetect/Forge/Examples/Inception
    export CODESIGN_ALLOCATE=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/codesign_allocate
    export PATH="/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin:/Applications/Xcode.app/Contents/Developer/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
    
Signing Identity:     "iPhone Developer: Adam Szendrei (762xxxxxxxx)"

    /usr/bin/codesign --force --sign 6xxxxxx --preserve-metadata=identifier,entitlements,flags --timestamp=none /Users/adamszendrei/Library/Developer/Xcode/DerivedData/Forge-gikysnjirgrtmefxffiekblqjkzy/Build/Products/Debug-iphoneos/Inception.app/Frameworks/Forge.framework

Warning: unable to build chain to self-signed root for signer "iPhone Developer: Adam Szendrei (762xxxxxxxx)"

do you have an idea how can I fix it?

Matching with predefined images (markers)

Is it possible to predefine some images(or files) which will work like markers?

implant Yolo to ARKit

i tried use yolo in ARKit.
i have implanted your code.

call predict in seesion delegate

func session(_ session: ARSession, didUpdate frame: ARFrame) {
        let seekingCM = CMTimeMakeWithSeconds(frame.timestamp, 1000000);
        let timestamp = seekingCM
        let deltaTime = timestamp - lastTimestamp
        if fps == -1 || deltaTime >= CMTimeMake(1, Int32(fps)) {
            lastTimestamp = timestamp
            
            if let texture = convertToMTLTexture(pixelBuffer:frame.capturedImage){
                predict(texture: texture)
            }
            
        }
    }

and convert texture with CVPixelBuffer instead SampleBuffer

func convertToMTLTexture(pixelBuffer: CVPixelBuffer?) -> MTLTexture? {
        if let textureCache = textureCache,
            let pixelBuffer = pixelBuffer{

            let width = CVPixelBufferGetWidth(pixelBuffer)
            let height = CVPixelBufferGetHeight(pixelBuffer)
            
            var texture: CVMetalTexture?
            CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault, textureCache,
                                                      pixelBuffer, nil, .bgra8Unorm, width, height, 0, &texture)
            if let texture = texture {
                return CVMetalTextureGetTexture(texture)
            }
        }
        return nil
    }

because arkit run camera with full screen ,and output 1280X720
so i changed height to 16/9

private func show(predictions: [YOLO.Prediction]) {
        DEBUGLOG(message: predictions.count)

        for i in 0..<boundingBoxes.count {
            if i < predictions.count {
                let prediction = predictions[i]
                
                // The predicted bounding box is in the coordinate space of the input
                // image, which is a square image of 416x416 pixels. We want to show it
                // on the video preview, which is as wide as the screen and has a 4:3
                // aspect ratio. The video preview also may be letterboxed at the top
                // and bottom.
                let width = view.bounds.width
                let height = width * 16 / 9
                let scaleX = width / CGFloat(YOLO.inputWidth)
                let scaleY = height / CGFloat(YOLO.inputHeight)
//                let top = (view.bounds.height - height) / 2
                
                // Translate and scale the rectangle to our own coordinate system.
                var rect = prediction.rect
                rect.origin.x *= scaleX
                rect.origin.y *= scaleY
//                rect.origin.y += top
                rect.size.width *= scaleX
                rect.size.height *= scaleY
                
                // Show the bounding box.
                let label = String(format: "%@ %.1f", labels[prediction.classIndex], prediction.score * 100)
                let color = colors[prediction.classIndex]
                boundingBoxes[i].show(frame: rect, label: label, color: color)
                
            } else {
                boundingBoxes[i].hide()
            }
        }
    }

it can run . But not effect right .
same bottle ,it can recog in your demo . but can't in my.

where am i missing ?
please help me

Relu6 in metal

Could you please write a swift class of Relu6 to replace MPSCNNNeuronReLU? I'm new with Metal and want to get a example to define new layers except kinds of Convolutions. Thank you very much!
Relu6 in tensorflow get f(x;a)=min(a*min(0, x)+max(0, x), 6)

Reshape Layer

Maybe i'm missing it in the documentation, but does Forge support a reshape layer?

Forge is not support ios 11.3

I update my xcode and iphone to ios11.3. Then the Forge can not build success.You maybe update the forge to support ios11.3,because some class have made some changes in ios 11.3

Slicing and Assigning Tensor forge

Hi!

Is there anyway to slice a tensor in forge? Or use an array as the input tensor values?

Thank you so much!

Custom TinyYOLO doesn't work

I have been trying to use Forge to detect a custom object based on TinyYolo model. By the way, every time I tried to run it gave me this error. My custom model consists of 2 classes, and tried to changes several parameters and seems not to work

The TinyYolo which based on voc that came with your example works fine.

Do you have any plan to implement ResNet with Forge?

First of all, your Forge make me happy. Thank you.
Now, I'm trying to implement ResNet using MPS, but stuck at adding two conv layers.
Is there any MPS API to support ResNet or do I have to copy two weights of conv layers from GPU to CPU and pushing again to GPU after calculating 'ADD'? (I think the latter is bad idea). I'm glad if you give me a little hint.

thank you.

how do you think the core ML framework with WWDC 2017

i think a lot of apps that use core ML will come to App Store :)
and Metal 2 is only for Mac OS ?

Greate job!

Instand of MTLTexture with MTLBuffer will get much more better performace. Subsample a pixel is slow.

Supported weights file for Forge

Hi,

I wanted to know if .bin or .dat are the only file types supported, for interacting with the Forge framework. Can Forge work with files like .caffemodel or .pb or with files which do not specify weights and biases for every layer?

If .bin or .dat are the only supported file types, are you aware of any conversion tools to convert from other binary weight file types?

Thanks

Update ios version and xcode version, can not run any more.

Hi, I just update my ios version to 11.3, my bad!
So I also update xcode to match my iphone, and I got some error like: "Module compiled with Swift 4.0.3 cannot be imported in Swift 4.1".

And I also got some error in Layer.swift:"Type of expression is ambiguous without more context"
or "'MPSRectNoClip' is only available on iOS 11.0 or newer".

Could you please help me to fix this problem?
Thank you!

why my metal shader is much slow than MPSCNN

hello , i am following you for a long time . i am also a iOS developer with deep learning . your code give me many help , thank you !

now i have a question about convolution. i use MPSCNN to run the CNN network for a long time ,for example ,VGG-NET , ResNet , SqueezeNet and so on . the performance is very good , SqueezeNet only need 20ms , i can use it to recognize image realtime with my iPhone. i am curious ， i do not know why MPSCNN is so fast adn high performance. i just know it use Metal and GPU. so i want write the kernel code myself and compare to MPSCNN .

i construct the convolution example for that:
the input is 3x224x224
the convolution kernel is 64x3x3
the pading is 1
the stride is 1
so the output is 64x224x224
and datatype is float

the MPSCNN code is that

NSDate *start2 = [NSDate date];
    MPSImageDescriptor *desc = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat32 width:224 height:224 featureChannels:3];
    MPSImage *srcImage = [[MPSImage alloc] initWithDevice:self.device imageDescriptor:desc];
    
    MPSImageDescriptor *desc2 = [MPSImageDescriptor imageDescriptorWithChannelFormat:MPSImageFeatureChannelFormatFloat32 width:224 height:224 featureChannels:64];
    MPSImage *outImage = [[MPSImage alloc] initWithDevice:self.device imageDescriptor:desc2];
    
    id<MTLCommandBuffer> commandBuffer = [self.commandQueue commandBuffer];

    int co = 4*224*224;
    int kernel_size = 3;
    int pad = 1;
    int stride = 1;
    int count = 64*224*224;
    
    float *buf = new float[co];
    for(int i =0;i<co;i++){
        buf[i] = 1.0;
    }
    
    int weight_count = 3*64*kernel_size*kernel_size;
    float *weight = new float[weight_count];
    for(int i =0;i<weight_count;i++){
        weight[i] = 0.123;
    }

    float *bias = new float[64];
    for(int i =0;i<64;i++){
        bias[i] = 1.23456789;
    }
    MTLRegion region = MTLRegionMake3D(0, 0, 0,224,224,1);
    [srcImage.texture replaceRegion:region mipmapLevel:0 slice:0 withBytes:buf bytesPerRow:srcImage.width*4*sizeof(float) bytesPerImage:0];
    
    MPSCNNConvolutionDescriptor *convdesc = [MPSCNNConvolutionDescriptor cnnConvolutionDescriptorWithKernelWidth:kernel_size kernelHeight:kernel_size inputFeatureChannels:3 outputFeatureChannels:64 neuronFilter:nil];
    convdesc.strideInPixelsX = stride;
    convdesc.strideInPixelsY = stride;
    convdesc.groups = 1;
    
    MPSCNNConvolution *conv = [[MPSCNNConvolution alloc] initWithDevice:self.device convolutionDescriptor:convdesc kernelWeights:weight biasTerms:bias flags:MPSCNNConvolutionFlagsNone];
    MPSOffset offset;
    offset.x = 0;
    offset.y = 0;
    offset.z = 0;
    conv.offset = offset;
    
    
    [conv encodeToCommandBuffer:commandBuffer sourceImage:srcImage destinationImage:outImage];
    NSTimeInterval localtime2 = [[NSDate date] timeIntervalSinceDate:start2] * 1000;
    cout << "data init used " << localtime2 << "ms" << endl;
    
    
    NSDate *start = [NSDate date];

    [commandBuffer commit];
    [commandBuffer waitUntilCompleted];
    
    delete [] buf;
    delete [] weight;
    delete [] bias;
    NSTimeInterval localtime = [[NSDate date] timeIntervalSinceDate:start] * 1000;

    cout << "gpu calc used " << localtime << "ms" << endl;

my metal code is that (because 4 channel is easy to process ,so i convert input to 4x224x224)

id <MTLComputePipelineState> pipline = self.pipelineShaderTex;
    
    int co = 4*224*224;
    int kernel_size = 3;
    int pad = 1;
    int stride = 1;
    int count = 64*224*224;
    
    float *buf = new float[co];
    for(int i =0;i<co;i++){
        buf[i] = 1.0;
    }
    
    int weight_count = 4*64*kernel_size*kernel_size;
    float *weight = new float[weight_count];
    for(int i =0;i<weight_count;i++){
        weight[i] = i%4 == 3 ? 0 : 0.123;
    }

    float *bias = new float[64];
    for(int i =0;i<64;i++){
        bias[i] = 1.23456789;
    }
    
    MetalConvolutionParameter param;
    param.count = count;
    param.padSize = pad;
    param.kernelSize = kernel_size;
    param.stride = stride;
    param.inputChannel = 3;
    param.outputChannel = 64;
    param.inputWidth = 224;
    param.inputHeight = 224;
    param.outputWidth = 224;
    param.outputHeight = 224;
    
    MTLTextureDescriptor *indesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:224 height:224 mipmapped:NO];
    indesc.textureType = MTLTextureType2D;
    
    MTLTextureDescriptor *outdesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:224 height:224 mipmapped:NO];
    outdesc.textureType = MTLTextureType2DArray;
    outdesc.arrayLength = 64/4;
    
    MTLTextureDescriptor *weightdesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:3 height:3 mipmapped:NO];
    weightdesc.textureType = MTLTextureType2DArray;
    weightdesc.arrayLength = 64;

    MTLTextureDescriptor *biasdesc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA32Float width:1 height:1 mipmapped:NO];
    biasdesc.textureType = MTLTextureType2DArray;
    biasdesc.arrayLength = 64/4;
    
    if(!self.inTexture){
        self.inTexture = [self.device newTextureWithDescriptor:indesc];
        self.outTexture = [self.device newTextureWithDescriptor:outdesc];
        self.weightTexture = [self.device newTextureWithDescriptor:weightdesc];
        self.biasTexture = [self.device newTextureWithDescriptor:biasdesc];
        
        [self.inTexture replaceRegion:MTLRegionMake3D(0, 0, 0, 224, 224, 1) mipmapLevel:0 slice:0 withBytes:buf bytesPerRow:224*4*sizeof(float) bytesPerImage:0];
        for(int i =0;i<weightdesc.arrayLength;i++){
            [self.weightTexture replaceRegion:MTLRegionMake3D(0, 0, 0, 3, 3, 1) mipmapLevel:0 slice:i withBytes:weight+3*3*4*i bytesPerRow:3*4*sizeof(float) bytesPerImage:0];
            
        }
        for(int i =0;i<biasdesc.arrayLength;i++){
            [self.biasTexture replaceRegion:MTLRegionMake3D(0, 0, 0, 1, 1, 1) mipmapLevel:0 slice:i withBytes:bias+4*i bytesPerRow:1*4*sizeof(float) bytesPerImage:0];
        }
    }
    id<MTLBuffer> parambuffer = [self.device newBufferWithBytes:&param length:sizeof(param) options:MTLResourceCPUCacheModeDefaultCache];

    id<MTLCommandBuffer> commandBuffer = [self.commandQueue commandBuffer];
    id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
    [encoder setComputePipelineState:pipline];
    [encoder setTexture:self.inTexture atIndex:0];
    [encoder setTexture:self.outTexture atIndex:1];
    [encoder setTexture:self.weightTexture atIndex:2];
    [encoder setTexture:self.biasTexture atIndex:3];
    [encoder setBuffer:parambuffer offset:0 atIndex:0];
    
    MTLSize threadsPerGroups = MTLSizeMake(32, 16, 1);
    MTLSize threadGroups = MTLSizeMake((224 + threadsPerGroups.width -1 ) / threadsPerGroups.width,
                                       (224 + threadsPerGroups.height -1 ) / threadsPerGroups.height, 16);
    
    [encoder dispatchThreadgroups:threadGroups threadsPerThreadgroup:threadsPerGroups];
    [encoder endEncoding];
    
    NSDate *start = [NSDate date];

    [commandBuffer commit];
    [commandBuffer waitUntilCompleted];
    
    delete [] buf;
    delete [] weight;
    delete [] bias;
    NSTimeInterval localtime = [[NSDate date] timeIntervalSinceDate:start] * 1000;
    cout << "Time used " << localtime << "ms" << endl;

and metal kernel function is （i do not process the pad and stride , and input is reading (0,0), ignore it , i just test calculator performance）

kernel void convolutionForwardTexture(texture2d<float, access::read> inTexture [[texture(0)]],
                                      texture2d_array<float, access::write> outTexture [[texture(1)]],
                                      texture2d_array<float, access::read> weights [[ texture(2) ]],
                                      texture2d_array<float, access::read> bias [[ texture(3) ]],
                                      const device MetalConvolutionParameter *convolvParams [[ buffer(0) ]],
                                      ushort3 gid [[ thread_position_in_grid ]]){
    if(gid.x>=224||gid.y>=224){
        return;
    }
    
    float total = 0;
    float total2 = 0;
    float total3 = 0;
    float total4 = 0;
    
    float4 k,input;
    int slice = gid.z;
    for(int kh =0;kh<3;kh++){
        for(int kw =0;kw<3;kw++) {
            k = weights.read(uint2(kw,kh),slice*4);
            input = inTexture.read(uint2(0,0));
            total+=dot(k,input);
            
            k = weights.read(uint2(kw,kh),slice*4+1);
            input = inTexture.read(uint2(0,0));
            total2+=dot(k,input);
            
            k = weights.read(uint2(kw,kh),slice*4+2);
            input = inTexture.read(uint2(0,0));
            total3+=dot(k,input);
            
            k = weights.read(uint2(kw,kh),slice*4+3);
            input = inTexture.read(uint2(0,0));
            total4+=dot(k,input);
        }
    }
    
    float4 output = float4(total,total2,total3,total4) + bias.read(uint2(0,0),slice);
    outTexture.write(output,uint2(gid.x,gid.y),gid.z);
    
}

the result is MPSCNN need only 10ms , and my code is 40ms , why my code is so slow ? i do not know how MPSCNN do it ? can you give some help for me ?

hollance / forge Goto Github PK

forge's Introduction

Hi there 👋

forge's People

Stargazers

Watchers

Forkers

forge's Issues

Recommend Projects

Recommend Topics

Recommend Org