marianne-m / brouhaha-vad Goto Github PK

View Code? Open in Web Editor NEW

118.0 10.0 22.0 92.86 MB

Predicts the level of noise and reverberation on your audiofiles

License: MIT License

Python 4.07% Jupyter Notebook 95.93%

brouhaha-vad's People

Stargazers

Watchers

brouhaha-vad's Issues

Sliding window at inference time

I'm afraid the sliding window at inference time will make things a bit confusing for users as beginning and end frames are 'missing'.
Would be great if the inference could align the output to the input by repeating the first and last frames N times such that audio_duration_in_ms / nb_output_frames = 16 ms (brouhaha frame duration)

I don't think this is a thing right now, right ?

PS: would be great to add a short description of the output in the README too!

Syntax in the documentation is not correct

Syntax for applying the model to data, as given in the readme.md, is not correct.

This works

python main.py apply --apply_folder my/path/to_folder/ --model_path models/best/checkpoints/best.ckpt --data_dir my/path/to_folder/ --ext wav

but the command given in the readme doesn't, as it is missing --apply_folder argument and has extra argument (--classes) not accepted by the main.py for apply.

RegressiveActivityDetectionTask object has no attribute 'logging_prefix'

logging_prefix attribute is not defined. I looked in task.py in the RegressiveActivityDetectionTask class and it does not exist, it is just called in other functions but is not defined.

Where is the ~/.pyannote/database.yml ??

where is this file?? I can not find, Can you help me with this? detail the path

One of the places I looked was this:
/home/deivison/workspace/VAD/miniconda3/envs/brouhaha/lib/python3.8/site-packages/pyannote

I also tried creating the .yml in the database directory:

Could you detail the process to do this fine-tuning step in more detail?

License ?

We need to decide on license for the code and for the model.

I'd go for

MIT for the code (same as pyannote to make future integration easier)
OpenRAIL for the model

Any thoughts @marianne-m @MarvinLvn ?

[BUG] in einops but impact to brouhaha model forward()

I found in the brouhaha-vad using the einops and cause this error "AttributeError: 'PyanNet' object has no attribute 'example_output'", the solution is changing with the torch.permute() function,

def forward(self, x: torch.Tensor): .. out = rearrange(out, "n b t o -> b t (n o)") return out
`

how do I retrain the model (best.ckpt) with the new implementation?

this is the same is issue like in here

pyannote/pyannote-audio#1620

snr and c50 detailed arrays are in an unexpected length

Hi, thanks for your work. I ran brouhaha on a file of length 3:37:57.326, which is 13077.326 seconds. I examined the c50 and detailed_snr_labels .npy files, and their shape was (756644,). I expected that 756644 * 16 / 1000 would equal the length of the clip (16ms per frame, as per the paper), but I saw it is not the case.

The ratio between the length of the audio file and the length of the arrays came out to 17.28ms per frame. I manually verified this by looking graphing the SNR and seeing that it lines up with speech starting and ending, only when I used 17.28/1000 as the conversion factor from frames to seconds. Where does the number come from? It doesn't come out to a whole number of samples in 16KHz (it's around 276.5 samples per frame, though maybe padding can explain the .5?)

An interesting side-note is that the .rttm file has correct timings, so it's not that everything is off.

No output is produced

As far as I can tell, no output is produced by the current script. Only the directories.

 % find out
out
out/.DS_Store
out/c50
out/rttm_files
out/detailed_snr_labels

I saw no installation errors so everything looked ok.

Wrong model specifications

model = Model.from_pretrained("models/best/checkpoints/best.ckpt")
model.specification
# Specifications(problem=<Problem.MULTI_LABEL_CLASSIFICATION: 2>, 
#                resolution=<Resolution.FRAME: 1>, 
#                duration=6, 
#                warm_up=(0.0, 0.0), 
#                classes=['speech'], 
#                permutation_invariant=False)

Would be nice to update classes to ['vad', 'snr', 'c50'].

Can convert to onnx model？

Hi @marianne-m
Can this model be converted to onnx? I want to use C++ inference on the mobile side? thanks!

pip install https://github.com/marianne-m/pyannote-brouhaha-db.git DONT WORK

pip install https://github.com/marianne-m/pyannote-brouhaha-db.git

DONT WORK

Could you help me find out why? because I followed all the steps correctly. I want to finetune the model.

I also tried this way:

pip install git+ssh://[email protected]:marianne-m/pyannote-brouhaha-db.git

and dont work too

torchvision is not available - cannot save figures Lightning automatically upgraded your loaded checkpoint from v1.6.5 to v2.1.2.

When running the available walkthrough, I received this error that says Torchvision is not available, what can I do? Or is it something that needs your support?

no attributes 'introspection'

Dear Marvin,

When running the codes, the following errors occurred,

I tried on both Windows and Ubuntu, and the same error occurred. There is no problem when doing the installation.

Could you let me know how to solve the problem?

regards,
Jiarui

pip installable package?

Would be nice to make this repo pip installable.

Ideally, it would be published on PyPI:

$ pip install brouhaha

But the following alternative would be enough, I think

$ pip install https://github.com/marianne-m/brouhaha-vad/archive/main.zip

All we need is a setup.py file that I am sure @hadware would love to prepare (c'est son péché mignon...)

First attempt at using SNR and C50 for speaker diarization

I have started playing a bit more with the model.

For each file of the VoxConverse test set, I did is the following:

apply the pretrained Brouhaha model on the whole file
run a speaker diarization pipeline
for each speech frame (at Brouhaha 16ms resolution), check whether the pipeline confused (left column) or missed speakers (row column)
I then plotted two distribution of estimated SNR (top row) and estimated C50 (bottom row): one (in red) for frames where the system was wrong, one (in green) where it was correct.

Here is the result:

Great result for SNR: the lower the SNR, the more likely the pipeline got the frame wrong.

For C50, that is less obvious but maybe diarization is not the best task for studying the impact of C50. I presume ASR would be more impacted by C50.

cc @marianne-m @MarvinLvn

marianne-m / brouhaha-vad Goto Github PK

brouhaha-vad's People

Stargazers

Watchers

Forkers

brouhaha-vad's Issues

Recommend Projects

Recommend Topics

Recommend Org