Code Monkey home page Code Monkey logo

Comments (9)

ermig1979 avatar ermig1979 commented on August 26, 2024

Hi! Look for Neon::DetectionLbpDetect16ii and Base::DetectionLbpDetect16ii. These functins are defined in files SimdNeonDetection.cpp and SimdBaseDetection.cpp.

from simd.

TonyCongqianWang avatar TonyCongqianWang commented on August 26, 2024

Thank you so much for your fast reply! I think I have mainly understood the Base::Detect method, but I am still a bit unsure about what exactly the line

sum += leaves[subset[c >> 5] & (1 << (c & 31)) ? leafOffset : leafOffset + 1];

does. I believe there are two different leaf values, depending on wether the feature is active according to some condition stored in subset. Is that correct? But how exactly is this condition evaluated? If so, why node thresholds not part of node, but part stored in subsets instead? Does that allow for better simd optimization? It looks to me as though (regardless of base or neon) Calculate is always called to calculate the Lbp Values. It would have thought, that calculating the Lbp features is the most expensive part of the detect function and thus, it would make sense to cache feature values if features are shared by stages.

As for the Neon::Detect method. I am a bit confused. I believe it does the same thing as is confirmed by your unit tests, but I am still unsure about a few things:

  • Is it true that for the 32i version, it evalues 4 neighbouring windows at the same time and evaluates all stages unless when more than one windows is positive and continues evaluation with the version when only one windows is positive?
  • What are the meaning of shuffle and mask in the Neon::Calculate implementation?
  • What is the meaning of vmvnq_u32(vceqq_u32(value, K32_00000000)); it looks to me that there are two bitwise not operations which does not make sense.
  • why are two _subset values loaded in leafMask regardless of u16 or u32 implementation?

It seems to me that the Base::Detect is somewhat easy to manipulate to allow more than depth 1 trees but the neon version is not. It would be enough for me to only use simd instructions for lbp feature calculation which is done in LeafMask it seems. If I give LeafMask my root node thresholds as subset parameter I should get left/right traversal decisions, correct?

In general I am wondering: are you interested in adding support for depths > 1 or do you think it is not worth it?

from simd.

ermig1979 avatar ermig1979 commented on August 26, 2024

I wrote this code more than 10 years ago and can't remember some details.

sum += leaves[subset[c >> 5] & (1 << (c & 31)) ? leafOffset : leafOffset + 1];

subset is a array of 8 int 32 which store 256 1-bit values. subset[c >> 5] & (1 << (c & 31)) gets on of these boolian values by c index.

from simd.

TonyCongqianWang avatar TonyCongqianWang commented on August 26, 2024

Again, thanks a lot for your help! Now I understand, c >> 5 and c & 32 are equivalent to c / 32 and c % 32. Good to know that the decision boundary or a given lbp feature can be arbitrary and not just some threshold.

So after some thinking I concluded, that it shouldnt be too hard to convert the code to allow higher depth trees. It might be as simple as adding one for loop and saving the decision in leaves (either directly as the new offset index, or as 0 / 1 for left and right, and using the usual tree traversing logic to calculte the new offset index.

Would you be interested to add this?

from simd.

ermig1979 avatar ermig1979 commented on August 26, 2024

Unfortunately no. There are following reasons:

  1. The HAAR and LBP cascade classifiers have much less accuracy and performance compare to solution based on neural network. So there is no sense to optimize legacy algorithms. My priorities are optimizations of DL based algorithms.
  2. Current SIMD optimizations uses that fact then data of cascad classifiers in (stump based case Depth = 1) is the same for every point of image. If alorithm has branches it makes very difficult to use SIMD.

Certainly you can try to make optimizations of this case on one's own. If you will do it I add with pleasure your solution to main SIMD branch.

from simd.

TonyCongqianWang avatar TonyCongqianWang commented on August 26, 2024

Oh thanks again for your input. What you said makes a lot of sense. Multiple windows are evaluated at the same time, if two windows branch differently, simd operations don't work anymore. That is a pity. The only easy optimization would be to use the current simd implementation for the root, and then go to the slower base version when branching. That does indeed defeat the purpose of the simd library and using the opencv version might be better at that point

from simd.

TonyCongqianWang avatar TonyCongqianWang commented on August 26, 2024

Regarding DL algorithms: Are there any full pipelines implemented in SIMD yet? Can you recommend an architechture that has fast CPU performance? When I use your cascade with LBP features, I need around 1ms per image (250 x 200) on my laptop and around 15 ms on my raspberry pi.

from simd.

ermig1979 avatar ermig1979 commented on August 26, 2024

I develop Synet. This framework allows to infer trained neral models and uses Simd as backend.

from simd.

TonyCongqianWang avatar TonyCongqianWang commented on August 26, 2024

Thanks a lot, it looks great! I will definitely install it and test its performance for my use

from simd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.