Code Monkey home page Code Monkey logo

broadcast-news-videos-dataset's People

Contributors

cyrta avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

obsidian1reaper

broadcast-news-videos-dataset's Issues

Dataset link

Hello. Is there a link for the dataset from where I can download it? Thank you very much in advance

Questions regarding related paper.

Hi,

I read your paper Speaker Diarization using Deep Recurrent Convolutional Neural Networks for speaker embedding. The details were very clear regarding the convolutional part.
But for the 2 recurrent blocks, how many neurons you used?

Which way did u flatten the 2nd recurrent layer to connect with the fullt conencted layer?

The embeddings fully connected layer is for the embedding only, means it is conencted to another layer of classification layer, for that classification layer, do you mix the classes among the different datasets?

Thank you.

paper details

More precisely, we use activations from the last layer of neural network as speaker embeddings. We aggregate the sigmoid outputs by summing all outputs class-wise over the whole audio excerpt to obtain a total amount of activation for each entry and then normalizing the values by dividing them with the maximum value among classes. The analysis of those embeddings in time allows the system to detect speaker change and identify the newly appearing speakers by comparing the extracted and normalized embedding with those previously seen. If the cosine similarity metric between the embeddings is higher than a threshold, fixed at 0.4 after a set of preliminary experiments, the speaker is considered as new. Otherwise, we map its identity to the one corresponding to the nearest embedding.

Hi @cyrta, Can you please elaborate this paragraph in the paper. This is my understanding please correct me if I am wrong.

  1. we use activations from the last layer of neural network as speaker embeddings. This is weird because the last layer would be softmax layer according to the loss function of the network. Or you meant to say that there is a dense layer with sigmoid activation before softmax layer and its activation are used as speaker embeddings. What is the size of the embeddings that are being extracted ?
  2. Then speaker embeddings are summed over the entire audio class-wise and normalized by dividing with maximum value among all the classes. I'm not sure after this. Now if the distance between any extracted embeddings and the previously obtained normalized embeddings is greater than 0.4 it is treated as new speaker otherwise we map it to the nearest embedding (say left) speaker ( or the most similar embedding previously seen ?).
  3. Also, in the paper, there is no discussion about how silent zones are treated whether any voice activity detector is employed etc. as this is part of Diarization Error Rate.

Thanks.

Access to the dataset

Hi,

Thanks for creating the dataset! And very nice explanation of the interesting work too. Can you provide a pointer to the database?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.