floriankrey / dnc Goto Github PK

View Code? Open in Web Editor NEW

78.0 9.0 14.0 3.71 GB

Discriminative Neural Clustering for Speaker Diarisation

License: Apache License 2.0

Python 83.61% Shell 16.39%

speaker-diarization supervised-clustering machine-learning clustering university-of-cambridge speech-processing

dnc's People

Contributors

Stargazers

Watchers

Forkers

hyzcn entn-at chaopig 1215thebqtic opencvnoob divyeshrajpura4114 twistedmove bbrookie yangyutu ishine llior greatnoble lemoncandy42 li-ronghui

dnc's Issues

Consider directly using the newest version of SpectralCluster

The SpectralCluster library has iterated many versions.

The newest versions have much more functionalities, including custom distances for K-means like cosine.

Please consider directly importing the newest version of this library instead of a nested fork 😄

Also, you can directly use the equivalent config in the ICASSP2018 paper in a few lines:

from spectralcluster import configs

labels = configs.icassp2018_clusterer.predict(X)

How to get d-vectors

Hi,
thank you for your work.

I am trying to replicate your setup with AMI, but I have no idea, how to get d-vectors. You propose some augmentation technique, but when running those commands, it is trying to find arks I do not have.

FileNotFoundError: [Errno 2] No such file or directory: 'data/train/train.00.ark

I was able to download AMI (ihm) and process data, but I am not able to find anything on d-vectors.

Thanks.

Custom Kmeans

Hi, I think that there is an over-simplification in Custom Kmeans, the way the centroids are estimated:

centres[each_center] = np.mean(X[each_center_samples], axis=0)

doesn't actually yield to the point that minimise the average custom distances within a cluster.
The mean is the optimal solution for the euclidian distance but not for an arbitrary distance. For instance, in the case of cosine distance, the mean calculated as above will give the optimum center of the cluster only if X rows are l2-normalised.

A more general solution would be to use sklearn_extra.cluster.KMedoids

Is it feasible to use DNC on own data?

The relevant parts of run.sh are removed (stages -1 to 3) and the used TDNN for the 32-dimensional d-vector embedding is unclear to me from the section 5.2 of the paper.
In issue #2 you mention any d-vector embedding could be used, but is this really true? Some parameters have to be identical, don't they (window size, overlap, sample rate,...)?
How would you have to edit the AMI-label files (the offsets probably)?
What files would I have to create for own data to use the decoder on these?

tldr: Is it feasible to use DNC on own data?

Why data/train.scp has same features in meeting EN2001a, EN2001d, EN2001e ?

I have found that EN2001a, EN2001d, and EN2001e meeting utterances have four times the features of the original.

Why is it like this?

from kaldiio import ReadHelper

feats = []
with ReadHelper('scp:data/train.scp') as reader:
    for key, numpy_array in reader:
        #if "0007881_0007913" in key:
        if "0056607_0056837" in key:
            print(key)
            feats.append(numpy_array)

#AMIXXX-00001-1EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-3EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-4EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-5EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837

print(np.array_equal(feats[0], feats[1]))
print(np.array_equal(feats[1], feats[2]))
print(np.array_equal(feats[2], feats[3]))

#True
#True
#True

issues about training with init model

Hi，
I tried to replicate the experiment，and follow yours step to train a DNC Transformer with the same configuration. The result of spectral cluster DER is the same，but the results of DNC are inconsistent. The DER of DNC is 32.52%，by contrast, the result of paper is 13.90%
My shell cmds are follow:

run.sh

#current path is DNC/
dnc_root=espnet/egs/ami/dnc1
path_to_datadir=data/augment_data
m50_real_augment_path=$path_to_datadir/m50.real.augment
m50_meeting_augment_path=$path_to_datadir/m50.meeting.augment
dvecdict_meeting_path=$path_to_datadir/dvecdict.meeting.split100
m50_real_path=$path_to_datadir/m50.real
SC_scoring_path=scoring/sys_rttm/SC_result
DNC_scoring_path=scoring/sys_rttm/DNC_result
data_path=data	
model_init=exp/mdm_train_pytorch_tag.for.model/results/model.acc.best
resume_path=exp/mdm_train_pytorch_tag.for.model/results/snapshot.ep.50

./path.sh
ln -s $dnc_root dnc1
#To generate training and validation data with sub-meeting length 50 and 1000 random shifts
python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf \
	--filtEncomp --maxlen 50 --augment 1000 --varnormalise $m50_real_augment_path
python3 datapreperation/gen_augment_data.py --input-scps data/dev.scp --input-mlfs data/dev.mlf \
	--filtEncomp --maxlen 50 --augment 1000 --varnormalise $m50_real_augment_path
	
#To generate training data with sub-meeting length 50 and 1000 random shifts using the meeting randomisation
python3 datapreperation/gen_dvecdict.py --input-scps data/train.scp --input-mlfs data/train.mlf \
	--filtEncomp --segLenConstraint 100 --meetingLevelDict $dvecdict_meeting_path
	
python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf \
	--filtEncomp --maxlen 50 --augment 100 --varnormalise --randomspeaker  \
	--dvectordict $dvecdict_meeting_path/train.npz $m50_meeting_augment_path
	
#To generate evaluation data
python3 datapreperation/gen_augment_data.py --input-scps data/eval.scp --input-mlfs data/eval.mlf \
	--filtEncomp --maxlen 50 --varnormalise $m50_real_path
	
cd $dnc_root

#To start training, run
CUDA_VISIBLE_DEVICES=1,2,3 ./run.sh --stage 4 --stop_stage 4 --train_json ../../../../$m50_real_augment_path/train.json \
	--ngpu 3 --dev_json ../../../../$m50_real_augment_path/dev.json --tag tag.for.model

#To track the progress of the training, run
tail -f exp/mdm_train_pytorch_tag.for.model/train.log

#Decode a DNC Tranformer
#Similar to the command used for training, run
#The decoding results are, by default, stored in multiple json files in exp/mdm_train_pytorch_tag.for.model/decode_dev_xxxxx/data.JOB.json
./run.sh --stage 5 --decode_json ../../../../$m50_real_path/eval.json --tag tag.for.model

cd ../../../../
#Running spectral clustering
#To run spectral clustering on previously generated evalutation data, for example for sub-meeting lengths 50:
python3 scoring/run_spectralclustering.py --p-percentile 0.95 --custom-dist cosine \
	--json-out $SC_scoring_path/eval95k24.1.json  $m50_real_path/eval.json
#Evaluation of clustering results
#First the DNC or SC output has to be converted into the RTTM format: 
#For SC:
python3 scoring/gen_rttm.py --input-scp $data_path/eval.scp --js-dir $SC_scoring_path \
	--js-num 1 --js-name eval95k24 --rttm-name eval95k24
#To score the result the reference rttm has to first be split into the appropriate sub-meeting lengths:
python3 scoring/split_rttm.py --submeeting-rttm $SC_scoring_path/eval95k24.rttm \
	--input-rttm scoring/refoutputeval.rttm --output-rttm $SC_scoring_path/reference.rttm
#Finally, the speaker error rate has to be calculated using:
python3 scoring/score_rttm.py --score-rttm $SC_scoring_path/eval95k24.rttm \
	--ref-rttm $SC_scoring_path/reference.rttm --output-scoredir $SC_scoring_path/result


#For DNC:
\cp $dnc_root/exp/mdm_train_pytorch_tag.for.model/decode_mdm_dev_decode/data* $DNC_scoring_path
python3 scoring/gen_rttm.py --input-scp $data_path/eval.scp --js-dir $DNC_scoring_path \
	--js-num 16 --js-name data --rttm-name evaldnc
#To score the result the reference rttm has to first be split into the appropriate sub-meeting lengths:
python3 scoring/split_rttm.py --submeeting-rttm $DNC_scoring_path/evaldnc.rttm \
	--input-rttm scoring/refoutputeval.rttm --output-rttm $DNC_scoring_path/reference.rttm
#Finally, the speaker error rate has to be calculated using:
python3 scoring/score_rttm.py --score-rttm $DNC_scoring_path/evaldnc.rttm \
	--ref-rttm $DNC_scoring_path/reference.rttm --output-scoredir $DNC_scoring_path/result

I consider the problem is that the training dataset was augmented by three ways, while I only used m50_real_augment dataset to train model.
Then I want to add parameter --init_model $model_init to train m50_meeting_augment dataset, there is an error:

train.log

# asr_train.py --config conf/tuning/train_transformer.yaml --ngpu 3 --backend pytorch --outdir exp/mdm_train_pytorch_tag.for.model/results --tensorboard-dir tensorboard/mdm_train_pytorch_tag.for.model --debugmode 1 --dict data/lang_1char/mdm_train_units.txt --debugdir exp/mdm_train_pytorch_tag.for.model --minibatches 0 --verbose 0 --resume --asr-model exp/mdm_train_pytorch_tag.for.model/results/model.acc.best --train-sample-rate 0.2 --rotate true --seed 1 --train-json ../../../../data/augment_data/m50.real.augment/train.json --valid-json ../../../../data/augment_data/m50.real.augment/dev.json 
# Started at Wed May  5 22:22:08 EDT 2021
#
/work/wj/DNC/venv/lib/python3.7/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))
2021-05-05 22:22:09,421 (asr_train:322) WARNING: Skip DEBUG/INFO messages
None
Traceback (most recent call last):
  File "/work/wj/DNC/espnet/egs/ami/dnc1/../../../espnet/bin/asr_train.py", line 386, in <module>
    main(sys.argv[1:])
  File "/work/wj/DNC/espnet/egs/ami/dnc1/../../../espnet/bin/asr_train.py", line 374, in main
    train(args)
  File "/work/wj/DNC/espnet/espnet/asr/pytorch_backend/asr.py", line 333, in train
    model = model_class(idim, odim, args, asr_model=asr_model, mt_model=mt_model)
TypeError: __init__() got an unexpected keyword argument 'asr_model'
# Accounting: time=2 threads=1
# Ended (code 1) at Wed May  5 22:22:10 EDT 2021, elapsed time 2 seconds

What steps have I made wrong?
Thank you very much！

Have trouble during training DNC Transformer

Hello, Thanks for your sharing!
I'm a graduate student from Taiwan.
I tried to follow yours step to train a DNC Transformer,but when I use
"./run.sh --stage 4 --stop_stage 4 --train_json /home/erichong0318/DNC/gen_data/m50.meeting.augment/train.json --dev_json /home/erichong0318/DNC/gen_data/m50.real.augment/dev.json --tag test"
to train my model,I encountered this problem.

The step to prepare data is all copy from yous github code
What should I do to solve this?
thanks a lot!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.