nyumaya / nyumaya_audio_recognition Goto Github PK

Classify audio with neural nets on embedded systems like the Raspberry Pi

License: Apache License 2.0

Python 86.99% C++ 13.01%

machine-learning audio-recognition embedded-systems hotword raspberry-pi keyword-spotting speech-commands wake-word-detection hotword-detection

nyumaya_audio_recognition's People

Contributors

Stargazers

Watchers

Forkers

harringaymakerspace torntrousers molekuul ssgalitsky charamza byxlk edlward rjkorn makserge drsolomon jan-far vitaly-z deadlockph

nyumaya_audio_recognition's Issues

Project was not running when using with different package

Describe the bug
In android getting this error when using the model
No implementation found for long
com.xxxxx.xxxxx.wakeword.nyumaya.NyumayaLibrary.createFeatureExtractor(int, int, int, int, int, float, float) (tried
Java_com_xxxxx_wakeword_nyumaya_NyumayaLibrary_createFeatureExtractor and
Java_com_xxxxx_wakeword_nyumaya_NyumayaLibrary_createFeatureExtractor__IIIIIFF) - is
the library loaded, e.g. System.loadLibrary?

Can you add code to train custom models?

Hello, I would like to try to train my own model, but I can not find the code for learning it in the repository, as I see the usual convolutional model is used here, could you publish the files for its learning, I want to try to train it on other devices with a small number of examples, I would not want to spend time repeating the structure of this. Thanks!

Could this work on a Mac?

I'm guessing it would need a Mac version of libnyumaya.so, how are they made?

Didn't find op for builtin opcode 'CONV_2D' version '2'

Describe the bug
I converted my own speech recognition model to tflite and used with streaming.py.
That throws an error:
Didn't find op for builtin opcode 'CONV_2D' version '2'

Registration failed
Error creating interpreter
Segmentation fault

To Reproduce
Steps to reproduce the behavior:
Just run streaming.py with my tflite model

Platform

Raspberry Pi

Additional context
I am using raspberry pi zero W for this.

Hi, This is Hemsingh. I like to more about Speech Recognition on Raspberry Pi . Mostly on Training on Google Colab. Please Help me .

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

streaming_example.py: arecord: main:788: audio open error: No such file or directory

Dusted off an old Pi Zero and I'm trying to run the streaming_example.py example, but it fails with:

python streaming_example.py --libpath ../lib/rpi/armv6/libnyumaya.so
Audio Recognition Version: 0.0.2
arecord: main:788: audio open error: No such file or directory

I'm using an I2S mic and I've been through the setup described at https://learn.adafruit.com/adafruit-i2s-mems-microphone-breakout/raspberry-pi-wiring-and-test#raspberry-pi-i2s-configuration and that successfully records sound from the mic when doing arecord -D plughw:1 -c1 -r 48000 -f S32_LE -t wav -V mono -v file.wav

Any ideas? Is there some extra config somewhere I need to get streaming_example.py to work?

training instructions?

i would like to train a few models to evaluate this lib, are there instructions available?

I made a plugin for Mycroft, see https://github.com/JarbasAl/numaya_hotword , training would be really important for community adoption

Real-time detection

Have you seen how ARM is doing continuous keyword detection in their samples?
https://github.com/ARM-software/ML-KWS-for-MCU/blob/master/Deployment/Source/KWS/kws.cpp

This is the core part... they are able to reuse most of the computation from the previous detection and then quickly compute the features on the newly arrived data.

if(num_frames>recording_win) {
//move old features left
memmove(mfcc_buffer,mfcc_buffer+(recording_win*num_mfcc_features),(num_frames-recording_win)*num_mfcc_features);
}
//compute features only for the newly recorded audio
int32_t mfcc_buffer_head = (num_frames-recording_win)num_mfcc_features;
for (uint16_t f = 0; f < recording_win; f++) {
mfcc->mfcc_compute(audio_buffer+(fframe_shift),&mfcc_buffer[mfcc_buffer_head]);
mfcc_buffer_head += num_mfcc_features;
}

train a new word

Hello
thank you for this great project
is it possible to train a new word in french for a personal assistant ?
thank you very much

'MultiDetector' object has no attribute 'detected_callback'

Describe the bug
after detecting "word" with MultiDetector
throw exception

Traceback (most recent call last):
Traceback (most recent call last):
File "multi_streaming_example.py", line 90, in
label_stream(FLAGS.libpath)
File "multi_streaming_example.py", line 68, in label_stream
threading.Thread(target=FTask(frame,extractor_gain)).start()
File "multi_streaming_example.py", line 39, in FTask
mDetector.run_frame(features)
File "./src/multi_detector.py", line 134, in run_frame
if(self.detected_callback):
AttributeError: 'MultiDetector' object has no attribute 'detected_callback'

To Reproduce
running multi_streaming_example.py

Platform
aarch64

Sound recognition

In the description of the project I saw that the goal was audio recognition.
I am looking for a way to detect sounds like fire alarms, broken glasses etc, to automatize even more my smart home.

As I understood this is not possible at the moment with this library correct?

Thanks

False recognition when no signal is present

Some models may be prone to false detections when no signal is present. This can also trigger an initial false detection on startup. Will be fixed in the next model release.

Poor recognition rate

Looks like something bad happened during the 0.3 release and the recognition accuracy is
significantly degraded.

Respeaker far field hardware for RaspPi

Seeed makes a cheap ($25) 4-channel far field mic board for the Rasp Pi, you can buy it from Amazon or direct. I have one is it works without issue. The four mics appear as a 4-channel ALSA device.

https://www.seeedstudio.com/ReSpeaker-4-Mic-Array-for-Raspberry-Pi-p-2941.html

A very useful addition would be instructions for replacing Snoboy with your keyword code in the Repeaker tutorial.

[Q] What are the best free options on hotword detection?

Hi! Thanks for the project. I was wondering if you knew what the top-notch hotword detection options are? I used to use snips.ai, but they have since sold out.

RTSP Stream

Hi,

Would it be possible for it to pull the audio from an RTSP stream from a IP CCTV camera

Command word request: Colors

For combinations with the light keyword, colors would be very useful:

Marvin: light white
Marvin: light blue

(and other common colors)

If I can be of help with providing audio samples if needed

Crowd Monitoring / Gun Shot Detection

Hi there,
Is your feature request related to a problem? Please describe.
My request is based on the fact that I am looking to build a crowd monitor using a raspberry pi. The end game is to build a cluster of them but for now I am looking at the software itself. This is where I found your software but I saw that you have not implemented the feature of gun shot detection.

Describe the solution you'd like
I would like to know if it were possible that you could train the audio recognition to specifically detect gun shots and in general crowd commotions.

Describe alternatives you've considered
I have considered looking at other software available but have not found any for raspberry pi or any other distribution in fact. I have also considered creating my own but even though I have quite a bit of knowledge about programming and Machine Learning, it is not enough for something like this.

Additional context
This is an idea I have for a project I am working on in my university and would really appreciate the help or any pointers you might have.

webrtc audio also supports VAD.
https://github.com/dpirch/libfvad

I haven't located a good library for DOA. Everyone seems to be using GCC PHAT.

So my working theory is this sequence:

VAD - one channel, or should run continuous KWD?
KWD - one channel
DOA -- in parallel with KWD?
after KWD
Beamforming using DOA, AEC, AGC, ANC - all channels

But can KWD be done in the presence of audio activity (music playing)? Does AEC need to happen before KWD?

Record my own command and hotword models?

Can I record my own command and hotword models somehow?