Code Monkey home page Code Monkey logo

nyumaya_audio_recognition's People

Contributors

codacy-badger avatar torntrousers avatar yodakohl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nyumaya_audio_recognition's Issues

Project was not running when using with different package

Describe the bug
In android getting this error when using the model
No implementation found for long
com.xxxxx.xxxxx.wakeword.nyumaya.NyumayaLibrary.createFeatureExtractor(int, int, int, int, int, float, float) (tried
Java_com_xxxxx_wakeword_nyumaya_NyumayaLibrary_createFeatureExtractor and
Java_com_xxxxx_wakeword_nyumaya_NyumayaLibrary_createFeatureExtractor__IIIIIFF) - is
the library loaded, e.g. System.loadLibrary?

Can you add code to train custom models?

Hello, I would like to try to train my own model, but I can not find the code for learning it in the repository, as I see the usual convolutional model is used here, could you publish the files for its learning, I want to try to train it on other devices with a small number of examples, I would not want to spend time repeating the structure of this. Thanks!

Didn't find op for builtin opcode 'CONV_2D' version '2'

Describe the bug
I converted my own speech recognition model to tflite and used with streaming.py.
That throws an error:
Didn't find op for builtin opcode 'CONV_2D' version '2'

Registration failed
Error creating interpreter
Segmentation fault

To Reproduce
Steps to reproduce the behavior:
Just run streaming.py with my tflite model

Platform

  • Raspberry Pi

Additional context
I am using raspberry pi zero W for this.

Hi, This is Hemsingh. I like to more about Speech Recognition on Raspberry Pi . Mostly on Training on Google Colab. Please Help me .

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

streaming_example.py: arecord: main:788: audio open error: No such file or directory

Dusted off an old Pi Zero and I'm trying to run the streaming_example.py example, but it fails with:

python streaming_example.py --libpath ../lib/rpi/armv6/libnyumaya.so
Audio Recognition Version: 0.0.2
arecord: main:788: audio open error: No such file or directory

I'm using an I2S mic and I've been through the setup described at https://learn.adafruit.com/adafruit-i2s-mems-microphone-breakout/raspberry-pi-wiring-and-test#raspberry-pi-i2s-configuration and that successfully records sound from the mic when doing arecord -D plughw:1 -c1 -r 48000 -f S32_LE -t wav -V mono -v file.wav

Any ideas? Is there some extra config somewhere I need to get streaming_example.py to work?

Real-time detection

Have you seen how ARM is doing continuous keyword detection in their samples?
https://github.com/ARM-software/ML-KWS-for-MCU/blob/master/Deployment/Source/KWS/kws.cpp

This is the core part... they are able to reuse most of the computation from the previous detection and then quickly compute the features on the newly arrived data.

if(num_frames>recording_win) {
//move old features left
memmove(mfcc_buffer,mfcc_buffer+(recording_win*num_mfcc_features),(num_frames-recording_win)*num_mfcc_features);
}
//compute features only for the newly recorded audio
int32_t mfcc_buffer_head = (num_frames-recording_win)num_mfcc_features;
for (uint16_t f = 0; f < recording_win; f++) {
mfcc->mfcc_compute(audio_buffer+(f
frame_shift),&mfcc_buffer[mfcc_buffer_head]);
mfcc_buffer_head += num_mfcc_features;
}

train a new word

Hello
thank you for this great project
is it possible to train a new word in french for a personal assistant ?
thank you very much

'MultiDetector' object has no attribute 'detected_callback'

Describe the bug
after detecting "word" with MultiDetector
throw exception

Traceback (most recent call last):
Traceback (most recent call last):
File "multi_streaming_example.py", line 90, in
label_stream(FLAGS.libpath)
File "multi_streaming_example.py", line 68, in label_stream
threading.Thread(target=FTask(frame,extractor_gain)).start()
File "multi_streaming_example.py", line 39, in FTask
mDetector.run_frame(features)
File "./src/multi_detector.py", line 134, in run_frame
if(self.detected_callback):
AttributeError: 'MultiDetector' object has no attribute 'detected_callback'

To Reproduce
running multi_streaming_example.py

Platform
aarch64

Sound recognition

In the description of the project I saw that the goal was audio recognition.
I am looking for a way to detect sounds like fire alarms, broken glasses etc, to automatize even more my smart home.

As I understood this is not possible at the moment with this library correct?

Thanks

False recognition when no signal is present

Some models may be prone to false detections when no signal is present. This can also trigger an initial false detection on startup. Will be fixed in the next model release.

Poor recognition rate

Looks like something bad happened during the 0.3 release and the recognition accuracy is
significantly degraded.

RTSP Stream

Hi,

Would it be possible for it to pull the audio from an RTSP stream from a IP CCTV camera

Command word request: Colors

For combinations with the light keyword, colors would be very useful:

Marvin: light white
Marvin: light blue

(and other common colors)

If I can be of help with providing audio samples if needed

Crowd Monitoring / Gun Shot Detection

Hi there,
Is your feature request related to a problem? Please describe.
My request is based on the fact that I am looking to build a crowd monitor using a raspberry pi. The end game is to build a cluster of them but for now I am looking at the software itself. This is where I found your software but I saw that you have not implemented the feature of gun shot detection.

Describe the solution you'd like
I would like to know if it were possible that you could train the audio recognition to specifically detect gun shots and in general crowd commotions.

Describe alternatives you've considered
I have considered looking at other software available but have not found any for raspberry pi or any other distribution in fact. I have also considered creating my own but even though I have quite a bit of knowledge about programming and Machine Learning, it is not enough for something like this.

Additional context
This is an idea I have for a project I am working on in my university and would really appreciate the help or any pointers you might have.

cpp example won't compile

It is missing an implementation for class AudioRecognition.

I think this is because you are working on branch V0.3 and not master.

Complete far field processing chain

What is a good sequence of processing for far field voice algorithms? When should you do VAD, AEC, DOA, beamforming, etc?

Here is example code that uses webrtc audio to do AEC, AGC, ANC. Webrtc audio can also do beamforming if you give it the vector of arrival.
https://github.com/shichaog/WebRTC-audio-processing

webrtc audio also supports VAD.
https://github.com/dpirch/libfvad

I haven't located a good library for DOA. Everyone seems to be using GCC PHAT.

So my working theory is this sequence:

  1. VAD - one channel, or should run continuous KWD?
  2. KWD - one channel
  3. DOA -- in parallel with KWD?
  4. after KWD
  5. Beamforming using DOA, AEC, AGC, ANC - all channels

But can KWD be done in the presence of audio activity (music playing)? Does AEC need to happen before KWD?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.