Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hi @zhangjinhong17 , You are right, the original paper uses al

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

CRNN question about ml-kws-for-mcu HOT 5 CLOSED

meixitu commented on August 22, 2024

CRNN question

from ml-kws-for-mcu.

Comments (5)

navsuda commented on August 22, 2024

Hi @zhangjinhong17 ,

You are right, the original paper uses all the time steps concatenated concatenated into fully-connected layer, but that did not give any higher accuracy than using just the last time step on this dataset.
Bi-directional GRU and layer normalization are other hyperparameters, which didn't seem to improve the accuracy on this dataset. On a new dataset you may have to try out these options too. Note that if you are using bi-directional GRU, you may have to do concat all the timesteps and do fully-connected layer.
From my understanding, in the CRNN paper, the alignments generated from DeepSpeech2 would of the format [silence, silence, silence,...,T,T,A,A,A,L,K,K,T,T,I,I,I,M,M,E,silence,silence] which would be converted to [filler, keyword, residual filler]. For more details on CTC used in DeepSpeech2 see this. If you use such frame-level aligned dataset to train the model, it should ideally give a more accurate model. You are right, that fully connected layer can't handle the invariance in timeshift unless you train with it in the dataset (random time shift augmentation helps a bit here). If you get a chance to generate the alignments for the speech commands dataset, please consider open-sourcing/sharing it.
I did not hear of another keyword spotting dataset as large as this one.

from ml-kws-for-mcu.

meixitu commented on August 22, 2024

Hi @navsuda ,
Thanks for your help.
Actually, why I concern the BiRNN or alignment, because my personal training result is worse than CRNN original paper claimed. I want to figure out why. BiRNN helps a little.But still not enough.
It is really very tough.

I found layer normalization is really very slow, I have GPU. Do you know why?
I found birnn with last timestep is still work.
In the website you mentioned, it is the principle of CTC algorithm. But actually, in the other documents, it's output is spark, it don't have so many repeat output. I think the alignment maybe is [filler, keyword].
I don't know how to get the alignments for speech commands dataset, it need many many time if do it manually.
Thank you again.

Jinhong

from ml-kws-for-mcu.

navsuda commented on August 22, 2024

Hi @zhangjinhong17,
I'm not sure why layer normalization is slow. Can you check if your GPU is being utilized at all?

from ml-kws-for-mcu.

meixitu commented on August 22, 2024

Hi @navsuda ,
The GPU is working. I can check the GPU status.
I will check your original code layer normalization speed.
Thanks
Jinhong

from ml-kws-for-mcu.

navsuda commented on August 22, 2024

Closing the issue due to inactivity, please reopen it if you still face the issue.

from ml-kws-for-mcu.

CRNN question about ml-kws-for-mcu HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent