pytorch=1.6.0, torchaudio=0.6.0, numpy=1.19.2, scipy=1.4.1, libkmcuda=6.2.3, torch-lfilter=0.0.3, pesq=0.0.2, pystoi=0.3.3
We provide five datasets, namely, Spk10_enroll, Spk10_test, Spk10_imposter, Spk251_train and Spk_251_test. They cover all the recognition tasks (i.e., CSI-E, CSI-NE, SV and OSI). The code in dataset/Dataset.py
will download them automatically. You can also manually download them using the follwing links:
Spk10_enroll, 18MB, MD5:0e90fb00b69989c0dde252a585cead85
Spk10_test, 114MB, MD5:b0f8eb0db3d2eca567810151acf13f16
Spk10_imposter, 212MB, MD5:42abd80e27b78983a13b74e44a67be65
Spk251_train, 10GB, MD5:02bee7caf460072a6fc22e3666ac2187
Spk251_test, 1GB, MD5:182dd6b17f8bcfed7a998e1597828ed6
- Download iv_system and xv_system, and untar them inside model directory. These contains the pre-trained ivector-PLDA and xvector-PLDA background models.
- Run
enroll_iv.py
andenroll_xv.py
to enroll the speakers in Spk10_enroll. The information about enroll speakers will be stored in speakers/.
python defense/natural_train.py -num_epoches 30 -batch_size 128 -model_ckpt 'path to store model' -log 'training log path'
- See
defense/natural_train.py
for more arguments and details.
-
Sole FGSM adversarial training:
python defense/adver_train.py -attacker FGSM -epsilon 0.002
-
Sole PGD adversarial training:
python defense/adver_train.py -attacker PGD -epsilon 0.002 -max_iter 10
-
Combining adversarial training with input transformation AT (randomized, should use EOT during training)
python defense/adver_train.py -attacker PGD -epsilon 0.002 -max_iter 10 -defense AT -defense_param 16 -EOT_size 10 -EOT_batch_size 5
-
See
defense/adver_train.py
for more arguments and details.
python attackMain.py -model_type AudioNet -model_file 'pre-trained model path' -attacker FAKEBOB -epsilon 0.002 -task CSI -root 'root of benign dataset' -name 'name of benign dataset'
python attackMain.py -model_type ivector -model_file 'path of speaker_model file gnerated by enroll_iv.py' -attacker FAKEBOB -epsilon 0.002 -task CSI -root 'root of benign dataset' -name 'name of benign dataset'
- See more detail in
attackMain.py
python test_attack.py -model_type AudioNet -model_file 'pre-trained model path' 'root of adver dataset' -name 'name of adver dataset'
- See
test_attack.py
for more detail
MC contains three state-of-the-art embedding-based speaker recognition models, i.e., ivector-PLDA, xvector-PLDA and AudioNet. Xvector-PLDA and AudioNet are based on neural networks while ivector-PLDA on statistic model (i.e Gaussian Mixture Model).
The flexibility and extensibility of SEC4SR make it easy to add new models. Just wrap the model as torch.nn.Module
and implement make_decision
abstract method. See model/Model.py
for detail.
To add new datasets, one just need to define a class inheriting from torch.utils.data.Dataset
, just like dataset/Dataset.py
.
To incorporate new attack algorithms, one just need to inhert from the class in attack/Attack.py
and implement the abstract method attack
. See attack/Attack.py
for detail.
All input transformation methods are implemented as standalone python functions, making it easy to extend this methods.
All these techniques are as standalone wrappers so that they can be easily plugged into attacks to mount adaptive attacks.