Code Monkey home page Code Monkey logo

brouhaha-vad's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

brouhaha-vad's Issues

Sliding window at inference time

I'm afraid the sliding window at inference time will make things a bit confusing for users as beginning and end frames are 'missing'.
Would be great if the inference could align the output to the input by repeating the first and last frames N times such that audio_duration_in_ms / nb_output_frames = 16 ms (brouhaha frame duration)

I don't think this is a thing right now, right ?

PS: would be great to add a short description of the output in the README too!

Syntax in the documentation is not correct

Syntax for applying the model to data, as given in the readme.md, is not correct.

This works

python main.py apply --apply_folder my/path/to_folder/ --model_path models/best/checkpoints/best.ckpt --data_dir my/path/to_folder/ --ext wav

but the command given in the readme doesn't, as it is missing --apply_folder argument and has extra argument (--classes) not accepted by the main.py for apply.

Where is the ~/.pyannote/database.yml ??

image

where is this file?? I can not find, Can you help me with this? detail the path

One of the places I looked was this:
/home/deivison/workspace/VAD/miniconda3/envs/brouhaha/lib/python3.8/site-packages/pyannote

I also tried creating the .yml in the database directory:
image

Could you detail the process to do this fine-tuning step in more detail?

[BUG] in einops but impact to brouhaha model forward()

I found in the brouhaha-vad using the einops and cause this error "AttributeError: 'PyanNet' object has no attribute 'example_output'", the solution is changing with the torch.permute() function,

def forward(self, x: torch.Tensor): .. out = rearrange(out, "n b t o -> b t (n o)") return out
`

how do I retrain the model (best.ckpt) with the new implementation?

this is the same is issue like in here

pyannote/pyannote-audio#1620

snr and c50 detailed arrays are in an unexpected length

Hi, thanks for your work. I ran brouhaha on a file of length 3:37:57.326, which is 13077.326 seconds. I examined the c50 and detailed_snr_labels .npy files, and their shape was (756644,). I expected that 756644 * 16 / 1000 would equal the length of the clip (16ms per frame, as per the paper), but I saw it is not the case.

The ratio between the length of the audio file and the length of the arrays came out to 17.28ms per frame. I manually verified this by looking graphing the SNR and seeing that it lines up with speech starting and ending, only when I used 17.28/1000 as the conversion factor from frames to seconds. Where does the number come from? It doesn't come out to a whole number of samples in 16KHz (it's around 276.5 samples per frame, though maybe padding can explain the .5?)

An interesting side-note is that the .rttm file has correct timings, so it's not that everything is off.

No output is produced

As far as I can tell, no output is produced by the current script. Only the directories.

 % find out
out
out/.DS_Store
out/c50
out/rttm_files
out/detailed_snr_labels

I saw no installation errors so everything looked ok.

Wrong model specifications

model = Model.from_pretrained("models/best/checkpoints/best.ckpt")
model.specification
# Specifications(problem=<Problem.MULTI_LABEL_CLASSIFICATION: 2>, 
#                resolution=<Resolution.FRAME: 1>, 
#                duration=6, 
#                warm_up=(0.0, 0.0), 
#                classes=['speech'], 
#                permutation_invariant=False)

Would be nice to update classes to ['vad', 'snr', 'c50'].

no attributes 'introspection'

Dear Marvin,

When running the codes, the following errors occurred,
image
I tried on both Windows and Ubuntu, and the same error occurred. There is no problem when doing the installation.

Could you let me know how to solve the problem?

regards,
Jiarui

pip installable package?

Would be nice to make this repo pip installable.

Ideally, it would be published on PyPI:

$ pip install brouhaha

But the following alternative would be enough, I think

$ pip install https://github.com/marianne-m/brouhaha-vad/archive/main.zip

All we need is a setup.py file that I am sure @hadware would love to prepare (c'est son péché mignon...)

First attempt at using SNR and C50 for speaker diarization

I have started playing a bit more with the model.

For each file of the VoxConverse test set, I did is the following:

  • apply the pretrained Brouhaha model on the whole file
  • run a speaker diarization pipeline
  • for each speech frame (at Brouhaha 16ms resolution), check whether the pipeline confused (left column) or missed speakers (row column)
  • I then plotted two distribution of estimated SNR (top row) and estimated C50 (bottom row): one (in red) for frames where the system was wrong, one (in green) where it was correct.

Here is the result:

image

Great result for SNR: the lower the SNR, the more likely the pipeline got the frame wrong.

For C50, that is less obvious but maybe diarization is not the best task for studying the impact of C50. I presume ASR would be more impacted by C50.

cc @marianne-m @MarvinLvn

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.