floriankrey / dnc Goto Github PK
View Code? Open in Web Editor NEWDiscriminative Neural Clustering for Speaker Diarisation
License: Apache License 2.0
Discriminative Neural Clustering for Speaker Diarisation
License: Apache License 2.0
Hi, I think that there is an over-simplification in Custom Kmeans, the way the centroids are estimated:
centres[each_center] = np.mean(X[each_center_samples], axis=0)
doesn't actually yield to the point that minimise the average custom distances within a cluster.
The mean
is the optimal solution for the euclidian distance but not for an arbitrary distance. For instance, in the case of cosine distance, the mean
calculated as above will give the optimum center of the cluster only if X rows are l2-normalised.
A more general solution would be to use sklearn_extra.cluster.KMedoids
Hello, Thanks for your sharing!
I'm a graduate student from Taiwan.
I tried to follow yours step to train a DNC Transformer,but when I use
"./run.sh --stage 4 --stop_stage 4 --train_json /home/erichong0318/DNC/gen_data/m50.meeting.augment/train.json --dev_json /home/erichong0318/DNC/gen_data/m50.real.augment/dev.json --tag test
"
to train my model,I encountered this problem.
The step to prepare data is all copy from yous github code
What should I do to solve this?
thanks a lot!
The SpectralCluster library has iterated many versions.
The newest versions have much more functionalities, including custom distances for K-means like cosine.
Please consider directly importing the newest version of this library instead of a nested fork 😄
Also, you can directly use the equivalent config in the ICASSP2018 paper in a few lines:
from spectralcluster import configs
labels = configs.icassp2018_clusterer.predict(X)
Hi,
I tried to replicate the experiment,and follow yours step to train a DNC Transformer with the same configuration. The result of spectral cluster DER is the same,but the results of DNC are inconsistent. The DER of DNC is 32.52%,by contrast, the result of paper is 13.90%
My shell cmds are follow:
#current path is DNC/
dnc_root=espnet/egs/ami/dnc1
path_to_datadir=data/augment_data
m50_real_augment_path=$path_to_datadir/m50.real.augment
m50_meeting_augment_path=$path_to_datadir/m50.meeting.augment
dvecdict_meeting_path=$path_to_datadir/dvecdict.meeting.split100
m50_real_path=$path_to_datadir/m50.real
SC_scoring_path=scoring/sys_rttm/SC_result
DNC_scoring_path=scoring/sys_rttm/DNC_result
data_path=data
model_init=exp/mdm_train_pytorch_tag.for.model/results/model.acc.best
resume_path=exp/mdm_train_pytorch_tag.for.model/results/snapshot.ep.50
./path.sh
ln -s $dnc_root dnc1
#To generate training and validation data with sub-meeting length 50 and 1000 random shifts
python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf \
--filtEncomp --maxlen 50 --augment 1000 --varnormalise $m50_real_augment_path
python3 datapreperation/gen_augment_data.py --input-scps data/dev.scp --input-mlfs data/dev.mlf \
--filtEncomp --maxlen 50 --augment 1000 --varnormalise $m50_real_augment_path
#To generate training data with sub-meeting length 50 and 1000 random shifts using the meeting randomisation
python3 datapreperation/gen_dvecdict.py --input-scps data/train.scp --input-mlfs data/train.mlf \
--filtEncomp --segLenConstraint 100 --meetingLevelDict $dvecdict_meeting_path
python3 datapreperation/gen_augment_data.py --input-scps data/train.scp --input-mlfs data/train.mlf \
--filtEncomp --maxlen 50 --augment 100 --varnormalise --randomspeaker \
--dvectordict $dvecdict_meeting_path/train.npz $m50_meeting_augment_path
#To generate evaluation data
python3 datapreperation/gen_augment_data.py --input-scps data/eval.scp --input-mlfs data/eval.mlf \
--filtEncomp --maxlen 50 --varnormalise $m50_real_path
cd $dnc_root
#To start training, run
CUDA_VISIBLE_DEVICES=1,2,3 ./run.sh --stage 4 --stop_stage 4 --train_json ../../../../$m50_real_augment_path/train.json \
--ngpu 3 --dev_json ../../../../$m50_real_augment_path/dev.json --tag tag.for.model
#To track the progress of the training, run
tail -f exp/mdm_train_pytorch_tag.for.model/train.log
#Decode a DNC Tranformer
#Similar to the command used for training, run
#The decoding results are, by default, stored in multiple json files in exp/mdm_train_pytorch_tag.for.model/decode_dev_xxxxx/data.JOB.json
./run.sh --stage 5 --decode_json ../../../../$m50_real_path/eval.json --tag tag.for.model
cd ../../../../
#Running spectral clustering
#To run spectral clustering on previously generated evalutation data, for example for sub-meeting lengths 50:
python3 scoring/run_spectralclustering.py --p-percentile 0.95 --custom-dist cosine \
--json-out $SC_scoring_path/eval95k24.1.json $m50_real_path/eval.json
#Evaluation of clustering results
#First the DNC or SC output has to be converted into the RTTM format:
#For SC:
python3 scoring/gen_rttm.py --input-scp $data_path/eval.scp --js-dir $SC_scoring_path \
--js-num 1 --js-name eval95k24 --rttm-name eval95k24
#To score the result the reference rttm has to first be split into the appropriate sub-meeting lengths:
python3 scoring/split_rttm.py --submeeting-rttm $SC_scoring_path/eval95k24.rttm \
--input-rttm scoring/refoutputeval.rttm --output-rttm $SC_scoring_path/reference.rttm
#Finally, the speaker error rate has to be calculated using:
python3 scoring/score_rttm.py --score-rttm $SC_scoring_path/eval95k24.rttm \
--ref-rttm $SC_scoring_path/reference.rttm --output-scoredir $SC_scoring_path/result
#For DNC:
\cp $dnc_root/exp/mdm_train_pytorch_tag.for.model/decode_mdm_dev_decode/data* $DNC_scoring_path
python3 scoring/gen_rttm.py --input-scp $data_path/eval.scp --js-dir $DNC_scoring_path \
--js-num 16 --js-name data --rttm-name evaldnc
#To score the result the reference rttm has to first be split into the appropriate sub-meeting lengths:
python3 scoring/split_rttm.py --submeeting-rttm $DNC_scoring_path/evaldnc.rttm \
--input-rttm scoring/refoutputeval.rttm --output-rttm $DNC_scoring_path/reference.rttm
#Finally, the speaker error rate has to be calculated using:
python3 scoring/score_rttm.py --score-rttm $DNC_scoring_path/evaldnc.rttm \
--ref-rttm $DNC_scoring_path/reference.rttm --output-scoredir $DNC_scoring_path/result
I consider the problem is that the training dataset was augmented by three ways, while I only used m50_real_augment dataset to train model.
Then I want to add parameter --init_model $model_init to train m50_meeting_augment dataset, there is an error:
# asr_train.py --config conf/tuning/train_transformer.yaml --ngpu 3 --backend pytorch --outdir exp/mdm_train_pytorch_tag.for.model/results --tensorboard-dir tensorboard/mdm_train_pytorch_tag.for.model --debugmode 1 --dict data/lang_1char/mdm_train_units.txt --debugdir exp/mdm_train_pytorch_tag.for.model --minibatches 0 --verbose 0 --resume --asr-model exp/mdm_train_pytorch_tag.for.model/results/model.acc.best --train-sample-rate 0.2 --rotate true --seed 1 --train-json ../../../../data/augment_data/m50.real.augment/train.json --valid-json ../../../../data/augment_data/m50.real.augment/dev.json
# Started at Wed May 5 22:22:08 EDT 2021
#
/work/wj/DNC/venv/lib/python3.7/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
2021-05-05 22:22:09,421 (asr_train:322) WARNING: Skip DEBUG/INFO messages
None
Traceback (most recent call last):
File "/work/wj/DNC/espnet/egs/ami/dnc1/../../../espnet/bin/asr_train.py", line 386, in <module>
main(sys.argv[1:])
File "/work/wj/DNC/espnet/egs/ami/dnc1/../../../espnet/bin/asr_train.py", line 374, in main
train(args)
File "/work/wj/DNC/espnet/espnet/asr/pytorch_backend/asr.py", line 333, in train
model = model_class(idim, odim, args, asr_model=asr_model, mt_model=mt_model)
TypeError: __init__() got an unexpected keyword argument 'asr_model'
# Accounting: time=2 threads=1
# Ended (code 1) at Wed May 5 22:22:10 EDT 2021, elapsed time 2 seconds
What steps have I made wrong?
Thank you very much!
Hi,
thank you for your work.
I am trying to replicate your setup with AMI, but I have no idea, how to get d-vectors. You propose some augmentation technique, but when running those commands, it is trying to find arks I do not have.
FileNotFoundError: [Errno 2] No such file or directory: 'data/train/train.00.ark
I was able to download AMI (ihm) and process data, but I am not able to find anything on d-vectors.
Thanks.
The relevant parts of run.sh
are removed (stages -1 to 3) and the used TDNN for the 32-dimensional d-vector embedding is unclear to me from the section 5.2 of the paper.
In issue #2 you mention any d-vector embedding could be used, but is this really true? Some parameters have to be identical, don't they (window size, overlap, sample rate,...)?
How would you have to edit the AMI-label files (the offsets probably)?
What files would I have to create for own data to use the decoder on these?
tldr: Is it feasible to use DNC on own data?
I have found that EN2001a, EN2001d, and EN2001e meeting utterances have four times the features of the original.
Why is it like this?
from kaldiio import ReadHelper
feats = []
with ReadHelper('scp:data/train.scp') as reader:
for key, numpy_array in reader:
#if "0007881_0007913" in key:
if "0056607_0056837" in key:
print(key)
feats.append(numpy_array)
#AMIXXX-00001-1EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-3EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-4EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
#AMIXXX-00001-5EN2001a-XXXXXX-11_XXXXXXX_0056607_0056837
print(np.array_equal(feats[0], feats[1]))
print(np.array_equal(feats[1], feats[2]))
print(np.array_equal(feats[2], feats[3]))
#True
#True
#True
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.