serversidehannes / las Goto Github PK

tf 2.0 implementation of Listen, attend and spell

Python 100.00%

2 attend keras listen listen-attend-and-spell recognition specaugment speech speech-recognition spell tensorflow tensorflow2 tf

las's People

Stargazers

Watchers

Forkers

sovse jyp0716 usamaehsan karimserkhane jueonpark

las's Issues

Regarding the class att_rnn

Hi. This is Yong Joon Lee. I am implementing LAS model based on your code. I know you might not remember the actual code cuz obviously you implemented it 3 years ago. But I think I found out that class att_rnn might have a tiny mistake in code ordering. If you see the class att_rnn's call part. you define s twice in a row then move onto c, which is a attention context.

your ordering is as below:

s       = self.rnn(inputs = inputs, states = states) # s = m_{t}, [m_{t}, c_{t}] #m is memory(hidden) and c is carry(cell)
s       = self.rnn2(inputs=s[0], states = s[1])[1] # s = m_{t+1}, c_{t+1}
c       = self.attention_context([s[0], h])

but isn't it supposed to be as below?

s       = self.rnn(inputs = inputs, states = states) # s = m_{t}, [m_{t}, c_{t}]
c       = self.attention_context([s[0], h]) 
s       = self.rnn2(inputs=s[0], states = s[1])[1] # s = m_{t+1}, c_{t+1}

As the original paper suggests, attention context vector at timestep t is made by applying attention to the s_t and h, where h is a result of pBLSTM. But I think by your way of ordering you are deriving attention context vector from s_{t+1} and h. Thank you for your great work.

    x = pBLSTM( dim//2 )(input_1) # (..., audio_len//2, dim*2)
    x = pBLSTM( dim//2 )(x) # (..., audio_len//4, dim*2)
    x = pBLSTM( dim//4 )(x) # (..., audio_len//8, dim)

which I corrected as :

    x = pBLSTM( dim//2 )(input_1) # (..., audio_len//4, dim*2)
    x = pBLSTM( dim//2 )(x) # (..., audio_len//16, dim*2)
    x = pBLSTM( dim//4 )(x) # (..., audio_len//64, dim)

Is it right?

word error rate

Have you tried to train the model on e.g. Librispeech dataset? I would like to see the word error rate.

serversidehannes / las Goto Github PK

las's People

Stargazers

Watchers

Forkers

las's Issues

Regarding the class att_rnn

pBLSTM Reshape

The token vector should be one-hot encoded.

Help needed with understanding x_2.

Dimension Error in `Listen`

word error rate

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent