Code Monkey home page Code Monkey logo

Comments (7)

daemon avatar daemon commented on August 22, 2024

Hi,

speech_demo.py segments the speech in overlapping windows. Posteriors aren't smoothed, instead being compared to a threshold at every timestep; i.e., if the output probability is less than some min_keyword_prob, then the label is treated as negative (see line 110 in server.py). It would be interesting to see how posterior smoothing compares.

[...] the wav file was basically converted into a single image, which led eventually to a single prediction.

Yep, you're correct that train.py does that. It's originally designed for the Google Speech Commands dataset, which has audio clips of only one second in length.

from honk.

waltergenchi avatar waltergenchi commented on August 22, 2024

Great, thanks!
What is the window and overlapping size?
I can't find the parameter in the speech_demo.py file.

from honk.

daemon avatar daemon commented on August 22, 2024

Oops, scratch that -- speech demo doesn't stride the windows at all. server.py contains code for striding. With the default parameters, windows aren't overlapped, since they are sent in chunks of one second. You'll need to increase the chunk size or the total number of chunks sent, in the demo application code. The stride can be adjusted in server.py.

from honk.

waltergenchi avatar waltergenchi commented on August 22, 2024

In server.py, line 85, the default stride_size=500 (in milliseconds) and in the speech_demo.py the default chunk_size=1000.
From that I have understood that actually there is overlapping between windows (f size 1 second, i.e. 1000 milliseconds).
Am I missing something?

from honk.

daemon avatar daemon commented on August 22, 2024

The definition of stride is

def stride(array, stride_size, window_size):
    i = 0
    while i + window_size <= len(array):
        yield array[i:i + window_size]
        i += stride_size

Thus, no overlapping occurs if (len(array) - window_size) // stride_size is less than 2.

from honk.

riatzukiza avatar riatzukiza commented on August 22, 2024

so does the listen end point accept a wav file, or a pcm buffer? I'm trying to use the http server from a new client. I saw that there is some compression going on in there?

All I know is that when I sent a one second long wav file (compressed and base 64 encoded), I got results, but they were not as good as the demo.

Would the sample rate also affect this? The microphone defaults to 44100 hz

from honk.

daemon avatar daemon commented on August 22, 2024

Yes, you're correct about the compression/b64 encoding. It was simply a hack to get the demo working in a short amount of time -- using WebSockets would have been a much better way to do it. The sample rate must be 16 kHz, so if you have only 44.1 kHz audio, you'll need to downsample first.

from honk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.