marianne-m / brouhaha-vad Goto Github PK
View Code? Open in Web Editor NEWPredicts the level of noise and reverberation on your audiofiles
License: MIT License
Predicts the level of noise and reverberation on your audiofiles
License: MIT License
I'm afraid the sliding window at inference time will make things a bit confusing for users as beginning and end frames are 'missing'.
Would be great if the inference could align the output to the input by repeating the first and last frames N times such that audio_duration_in_ms / nb_output_frames = 16 ms (brouhaha frame duration)
I don't think this is a thing right now, right ?
PS: would be great to add a short description of the output in the README too!
Syntax for applying the model to data, as given in the readme.md, is not correct.
This works
python main.py apply --apply_folder my/path/to_folder/ --model_path models/best/checkpoints/best.ckpt --data_dir my/path/to_folder/ --ext wav
but the command given in the readme doesn't, as it is missing --apply_folder argument and has extra argument (--classes) not accepted by the main.py for apply.
where is this file?? I can not find, Can you help me with this? detail the path
One of the places I looked was this:
/home/deivison/workspace/VAD/miniconda3/envs/brouhaha/lib/python3.8/site-packages/pyannote
I also tried creating the .yml in the database directory:
Could you detail the process to do this fine-tuning step in more detail?
We need to decide on license for the code and for the model.
I'd go for
Any thoughts @marianne-m @MarvinLvn ?
I found in the brouhaha-vad using the einops and cause this error "AttributeError: 'PyanNet' object has no attribute 'example_output'", the solution is changing with the torch.permute() function,
def forward(self, x: torch.Tensor): .. out = rearrange(out, "n b t o -> b t (n o)") return out
`
how do I retrain the model (best.ckpt) with the new implementation?
this is the same is issue like in here
Hi, thanks for your work. I ran brouhaha on a file of length 3:37:57.326, which is 13077.326 seconds. I examined the c50 and detailed_snr_labels .npy
files, and their shape was (756644,)
. I expected that 756644 * 16 / 1000 would equal the length of the clip (16ms per frame, as per the paper), but I saw it is not the case.
The ratio between the length of the audio file and the length of the arrays came out to 17.28ms per frame. I manually verified this by looking graphing the SNR and seeing that it lines up with speech starting and ending, only when I used 17.28/1000 as the conversion factor from frames to seconds. Where does the number come from? It doesn't come out to a whole number of samples in 16KHz (it's around 276.5 samples per frame, though maybe padding can explain the .5?)
An interesting side-note is that the .rttm
file has correct timings, so it's not that everything is off.
As far as I can tell, no output is produced by the current script. Only the directories.
% find out
out
out/.DS_Store
out/c50
out/rttm_files
out/detailed_snr_labels
I saw no installation errors so everything looked ok.
model = Model.from_pretrained("models/best/checkpoints/best.ckpt")
model.specification
# Specifications(problem=<Problem.MULTI_LABEL_CLASSIFICATION: 2>,
# resolution=<Resolution.FRAME: 1>,
# duration=6,
# warm_up=(0.0, 0.0),
# classes=['speech'],
# permutation_invariant=False)
Would be nice to update classes
to ['vad', 'snr', 'c50'].
Hi @marianne-m
Can this model be converted to onnx? I want to use C++ inference on the mobile side? thanks!
pip install https://github.com/marianne-m/pyannote-brouhaha-db.git
DONT WORK
Could you help me find out why? because I followed all the steps correctly. I want to finetune the model.
I also tried this way:
pip install git+ssh://[email protected]:marianne-m/pyannote-brouhaha-db.git
and dont work too
Would be nice to make this repo pip installable.
Ideally, it would be published on PyPI:
$ pip install brouhaha
But the following alternative would be enough, I think
$ pip install https://github.com/marianne-m/brouhaha-vad/archive/main.zip
All we need is a setup.py
file that I am sure @hadware would love to prepare (c'est son péché mignon...)
I have started playing a bit more with the model.
For each file of the VoxConverse test set, I did is the following:
Here is the result:
Great result for SNR: the lower the SNR, the more likely the pipeline got the frame wrong.
For C50, that is less obvious but maybe diarization is not the best task for studying the impact of C50. I presume ASR would be more impacted by C50.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.