Code Monkey home page Code Monkey logo

Comments (5)

navsuda avatar navsuda commented on August 22, 2024

Hi @zhangjinhong17 ,

  1. You are right, the original paper uses all the time steps concatenated concatenated into fully-connected layer, but that did not give any higher accuracy than using just the last time step on this dataset.
  2. Bi-directional GRU and layer normalization are other hyperparameters, which didn't seem to improve the accuracy on this dataset. On a new dataset you may have to try out these options too. Note that if you are using bi-directional GRU, you may have to do concat all the timesteps and do fully-connected layer.
  3. From my understanding, in the CRNN paper, the alignments generated from DeepSpeech2 would of the format [silence, silence, silence,...,T,T,A,A,A,L,K,K,T,T,I,I,I,M,M,E,silence,silence] which would be converted to [filler, keyword, residual filler]. For more details on CTC used in DeepSpeech2 see this. If you use such frame-level aligned dataset to train the model, it should ideally give a more accurate model. You are right, that fully connected layer can't handle the invariance in timeshift unless you train with it in the dataset (random time shift augmentation helps a bit here). If you get a chance to generate the alignments for the speech commands dataset, please consider open-sourcing/sharing it.
  4. I did not hear of another keyword spotting dataset as large as this one.

from ml-kws-for-mcu.

meixitu avatar meixitu commented on August 22, 2024

Hi @navsuda ,
Thanks for your help.
Actually, why I concern the BiRNN or alignment, because my personal training result is worse than CRNN original paper claimed. I want to figure out why. BiRNN helps a little.But still not enough.
It is really very tough.

  1. I found layer normalization is really very slow, I have GPU. Do you know why?
  2. I found birnn with last timestep is still work.
  3. In the website you mentioned, it is the principle of CTC algorithm. But actually, in the other documents, it's output is spark, it don't have so many repeat output. I think the alignment maybe is [filler, keyword].
    I don't know how to get the alignments for speech commands dataset, it need many many time if do it manually.
    Thank you again.

Jinhong

from ml-kws-for-mcu.

navsuda avatar navsuda commented on August 22, 2024

Hi @zhangjinhong17,
I'm not sure why layer normalization is slow. Can you check if your GPU is being utilized at all?

from ml-kws-for-mcu.

meixitu avatar meixitu commented on August 22, 2024

Hi @navsuda ,
The GPU is working. I can check the GPU status.
I will check your original code layer normalization speed.
Thanks
Jinhong

from ml-kws-for-mcu.

navsuda avatar navsuda commented on August 22, 2024

Closing the issue due to inactivity, please reopen it if you still face the issue.

from ml-kws-for-mcu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.