Code Monkey home page Code Monkey logo

las's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

las's Issues

Regarding the class att_rnn

Hi. This is Yong Joon Lee. I am implementing LAS model based on your code. I know you might not remember the actual code cuz obviously you implemented it 3 years ago. But I think I found out that class att_rnn might have a tiny mistake in code ordering. If you see the class att_rnn's call part. you define s twice in a row then move onto c, which is a attention context.

your ordering is as below:

s       = self.rnn(inputs = inputs, states = states) # s = m_{t}, [m_{t}, c_{t}] #m is memory(hidden) and c is carry(cell)
s       = self.rnn2(inputs=s[0], states = s[1])[1] # s = m_{t+1}, c_{t+1}
c       = self.attention_context([s[0], h])

but isn't it supposed to be as below?

s       = self.rnn(inputs = inputs, states = states) # s = m_{t}, [m_{t}, c_{t}]
c       = self.attention_context([s[0], h]) 
s       = self.rnn2(inputs=s[0], states = s[1])[1] # s = m_{t+1}, c_{t+1}

As the original paper suggests, attention context vector at timestep t is made by applying attention to the s_t and h, where h is a result of pBLSTM. But I think by your way of ordering you are deriving attention context vector from s_{t+1} and h. Thank you for your great work.

pBLSTM Reshape

Please why reshaping the output of pBLSTM by a factor of 4

Help needed with understanding x_2.

x_2 should have shape (Batch-size, no_prev_tokens, No_tokens).
x_2 = np.random.random((1, 12, 16))
When you say number of previous token, what exactly does it mean?

At training time I would know all the tokens, right?

Dimension Error in `Listen`

I think the dimension is not right in part Listen:

    x = pBLSTM( dim//2 )(input_1) # (..., audio_len//2, dim*2)
    x = pBLSTM( dim//2 )(x) # (..., audio_len//4, dim*2)
    x = pBLSTM( dim//4 )(x) # (..., audio_len//8, dim)

which I corrected as :

    x = pBLSTM( dim//2 )(input_1) # (..., audio_len//4, dim*2)
    x = pBLSTM( dim//2 )(x) # (..., audio_len//16, dim*2)
    x = pBLSTM( dim//4 )(x) # (..., audio_len//64, dim)

Is it right?

word error rate

Have you tried to train the model on e.g. Librispeech dataset? I would like to see the word error rate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.