Code Monkey home page Code Monkey logo

Comments (8)

ynop avatar ynop commented on August 19, 2024

I extended the example in the Readme a bit.
https://github.com/ynop/py-ctc-decode#beam-search-with-lm

Hope it helps.
Otherwise just ask.

from py-ctc-decode.

HuizhenShu avatar HuizhenShu commented on August 19, 2024

Thanks for your reply!
my test logits' shape is (1, 131, 1224)(batch_size,time_step,V), the true result is a sentence with 20 words. But what I got from decode_batch has only one word.

from py-ctc-decode.

ynop avatar ynop commented on August 19, 2024
  • Are you sure that the given sentence is predicted with the language model you use? Words not in the language model won't be predicted.
  • Have you used ' ' as space and '_' as blank in the vocabulary?
  • Have you used log probabilities?

Otherwise maybe try best-path to check if it works there.

from py-ctc-decode.

HuizhenShu avatar HuizhenShu commented on August 19, 2024

Thanks for your reply!

  1. yes, I use the same corpus to train the language model and the acoustic model

  2. '_' is in my vocabulary, but ' ' is not

  3. log probabilities? Are you mean when compute the ctc-loss, transpose the logits with tf.log first? if so ,yes,I used it.

I have try the best-path method. what I got is a bunch of words without blank in them. Then, I add the blank in BestPathDecoder.decode[line15] : pred = ' '.join(pred).replace('_', '') ,I got the reasonable result.
Should I retrain my acoustic model with a new vocabulary( the version which adds ' ')

from py-ctc-decode.

HuizhenShu avatar HuizhenShu commented on August 19, 2024

The corpus I use is like this -->'zhe4 feng1 xin4 xie3 yu2 gong1 yuan2 yi1 liu4 wu3 si4 nian2 shi4 '
I would split the sentence and transform the words into ids, so, theoretically,space is not a part of the input data. In this case, can I use your【Beam Search with LM】 in some way?

from py-ctc-decode.

ynop avatar ynop commented on August 19, 2024

Hmm, the space is needed for since it is the point which triggers the language model.
If your input are word ids you would have to adapt the algorithm.

from py-ctc-decode.

HuizhenShu avatar HuizhenShu commented on August 19, 2024

I seem to understand. I will try these two methods below

  1. retrain the acoustic model with new data (add a space id in the middle of each word id )
  2. when use the 【Beam Search with LM】,add space to each symbol I got.
    Thank you for your patience. I'll reply this when I get results

from py-ctc-decode.

HuizhenShu avatar HuizhenShu commented on August 19, 2024

I have try the second method. Add a space before symbol when calculate value. It works.
Thanks a lot

from py-ctc-decode.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.