jefflai108 / assert Goto Github PK

JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

License: MIT License

Python 18.35% MATLAB 79.05% M 0.23% Shell 2.37%

asvspoof anti-spoofing spoofing-attack automatic-speaker-verification

assert's Introduction

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

This repository contains codes to reproduce the core results from our Interspeech 2019 paper: ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

If you find the code useful, please cite

@article{lai2019assert,
  title={ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks},
  author={Lai, Cheng-I and Chen, Nanxin and Villalba, Jes{\'u}s and Dehak, Najim},
  journal={arXiv preprint arXiv:1904.01120},
  year={2019}
}

Abstract

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT). Anti-spoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: text-to-speech, voice conversion, and replay. Built upon previous research work on Deep Neural Network (DNN), ASSERT is a pipeline for DNN-based approach to anti-spoofing. ASSERT has four components: feature engineering, DNN models, network optimization and system combination, where the DNN models are variants of squeeze-excitation and residual networks. We conducted an ablation study of the effectiveness of each component on the ASVspoof 2019 corpus, and experimental results showed that ASSERT obtained more than 93% and 17% relative improvements over the baseline systems in the two sub-challenges in ASVspooof 2019, ranking ASSERT one of the top performing systems.

Single Model Benchmark Results

Note: Evaluation key is not released yet, so we only present dev results below.

Model	PA dev min-tDCF	PA dev EER (%)	LA dev min-tDCF	LA dev EER (%)
SENet34	0.01514	0.5751	0.0	0.0
SENet50	0.01709	0.6317	0.0	0.0
Attentive Filtering Network	0.02096	0.7407	0.0	0.0
Dilated ResNet	0.02377	0.7798	0.0	0.0
Mean-Std ResNet	0.022	0.832	0.0	0.0
CQCC-GMM	0.195	9.87	0.012	0.43
LFCC-GMM	0.255	11.96	0.066	2.71
100-i-vectors	0.306	12.37	0.155	5.18
200-i-vectors	0.322	12.52	0.121	4.12

Download Pretrained Model Weights

We included the pretrained model weights. We do not plan to release the pretrained models for Mean-Std ResNets.

SENet34: ./pretrained/pa/senet34 and ./pretrained/la/senet34
SENet50: ./pretrained/pa/senet50 and ./pretrained/la/senet50
Attentive Filtering Network: ./pretrained/pa/attentive_filtering_network and ./pretrained/la/attentive_filtering_network
Dilated ResNet: ./pretrained/pa/dilated_resnet and ./pretrained/la/dilated_resnet

Dependencies

This project uses Python 2.7. Before running the code, you have to install

The former 2 dependencies can be installed using pip by running

pip install -r requirements.txt

Getting Started

./assert/ contains the main source codes, ./baseline/ contains the code for CQCC-GMM, LFCC-GMM and i-vectors, and ./features/ contains acoustic feature extraction codes.

Make sure to read through the ASVspoof 2019 webpage and download the dataset.

Baselines

./baseline/ contains code for the CQCC-GMM, LFCC-GMM, and i-vectors baselines. ./baseline/baseline_CM.m is the official MATLAB script for CQCC-GMM and LFCC-GMM. Make sure to organize the dataset according to the comments within the script. To train the baselines, do

./baseline_CM.sh

Features

./features/ contains code for acoustic feature extraction. Follow ./features/run_feature.sh stage by stage for acoustic feature extraction. Be sure to set up Kaldi and modify ./features/path.sh to point to your own Kaldi directory. To extract features, do

./run_feature.sh.sh

ASSERT

./assert/ contains the code for ASSERT. ./assert/main.py contains the model training script and model hyperparameters, and ./assert/src/ contains model implementations. To train the models, do

python main.py

To trian the model on GPU (recommended), do

CUDA_VISIBLE_DEVICES=`free-gpu` python main.py

Experiments will be saved automatically by Sacred to ./assert/snapshots/

Authors

Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

If you encouter any problem, feel free to contact me.

assert's People

Contributors

Stargazers

Watchers

assert's Issues

focal_loss can help decrease the eer?

utt2systemID_file?

Thank you for sharing,I don't know what file utt2systemid_files is? Can you share the documents in detail. please

can you give me some guidance on how to use Bosaris toolkit

Hi,
I'm interested in this challenge, and designed some single anti-spoofing model for LA dataset. When I read the related papers in ASVspoof2019 Challenge, I found many primary systems are obtained by fusing some single models. I want to use Bosaris toolkit to fuse my model, but I have no idea how to use it, can you offer a demo code?

Docker availability?

Is there docker available for the architecture.

How did you do system combination for your primary model?

the problem of training model by SENet34

@jefflai108
On dev database,
I set NUM_SPOOF_CLASS=2,only can get eer=0.869.
I set NUM_SPOOF_CLASS=10,only can get err=0.797.
Can you give some help ? Thanks!

It is easy to overfit

@jefflai108
I can get err=0.514 on PA dev set, but get err=4.312% on PA eval set. How to solve the problem?
Thanks !

Parameters/Configurations of the pretrained models

Hi,

I noticed that in the assert/main.py, we have the configuration for resnet.

model_params = {
         'MODEL_SELECT' : 1, # which model 
          'NUM_SPOOF_CLASS' : 7, # x-class classification
          'FOCAL_GAMMA' : None, # gamma parameter for focal loss; if obj is not focal loss, set this to None 
          'NUM_RESNET_BLOCK' : 5, # number of resnet blocks in ResNet 
          'AFN_UPSAMPLE' : 'Bilinear', # upsampling method in AFNet: Conv or Bilinear
          'AFN_ACTIVATION' : 'sigmoid', # activation function in AFNet: sigmoid, softmaxF, softmaxT
          'NUM_HEADS' : 3, # number of heads for multi-head att in SAFNet 
          'SAFN_HIDDEN' : 10, # hidden dim for SAFNet
          'SAFN_DIM' : 'T', # SAFNet attention dim: T or F
          'RNN_HIDDEN' : 128, # hidden dim for RNN
          'RNN_LAYERS' : 4, # number of hidden layers for RNN
          'RNN_BI': True, # bidirecitonal/unidirectional for RNN
          'DROPOUT_R' : 0.0, # dropout rate

but it does not fit for all the 4 pretrained models.

To load the pretrained models successfully, could you provide the configuration/parameters of them? Thanks !

logspec feature extraction from run_features.sh

@jefflai108 Would you mind explain a little bit what does the difference of extracting feature of logspec in the last two 2 stages https://github.com/jefflai108/ASSERT/blob/master/features/run_feature.sh#L329-L362, comparing with the beginning part (which you also have the logspec for eval, train and dev separately)?

Thanks

post() function for class analysis

@jefflai108 Hi Jeff, Is that right that the post() function in main.py only targets for Multi-class analysis. I trained a binary classifier and use this to do the post analysis, by changing "binary_class=true" in the following code. The results seem skewed (only recognizing for one category) and not right.
Thanks!

Data loading code (class analysis for multi-class classification only)

train_data = SpoofDatsetSystemID(data_files['train_scp'], data_files['train_utt2index'], binary_class=False)

example of utt2systemID

@jefflai108 Jeff, would you mind post some example of what does look like?
Is this same as the previous utt2spk?

Thanks

systemID_files = ['la_dev_utt2systemID', 'la_train_utt2systemID',
'pa_dev_utt2systemID', 'pa_train_utt2systemID']

the pretrained/pa/senet34 is binary_class?

the pretrained/pa/senet34 is binary_class, It has two class:spoof and bonafide

run_feature.sh: line 46: utils/copy_data_dir.sh: No such file or directory

After finished baseline_CM.sh, I ran run_feature.sh, and encountered such error: run_feature.sh: line 46: utils/copy_data_dir.sh: No such file or directory. How can I produce such a file and what should be included?

No such file or directory: u'data_reader/feats/la_dev_spec_tensor4.scp'

Hi @jefflai108 ,

What are the steps to get this file?

I am running:

$ python main.py 
INFO - asvspoof19 - Running command 'main'
INFO - asvspoof19 - Started run with ID "185"
squeeze-and-excitation network
===> Model total parameter: 1343762
===> loading checkpoint 'snapshots/181/model_best.pth.tar'
===> loaded checkpoint 'snapshots/181/model_best.pth.tar' (epoch 6)
ERROR - asvspoof19 - Failed after 0:00:02!
  File "main.py", line 675, in main
    forward_pass()
  File "main.py", line 302, in forward_pass
    val_data    = SpoofDatsetSystemID(data_files['dev_scp'], data_files['dev_utt2index'], binary_class=True)
  File "/home/infcodey/repos/ASSERT/assert/data_reader/dataset_v1.py", line 154, in __init__
    with open(scp_file) as f:
Traceback (most recent calls WITHOUT Sacred internals):IOError: [Errno 2] No such file or directory: u'data_reader/feats/la_dev_spec_tensor4.scp'

I'm trying to run the binary SENet34 model (best performing one) on the dev & eval set.

Thank you

no utils script

Hi,
I don't find any script about the feature or directory utils, can you please release such scripts about the feature

./run_feature.sh: line 18: data/la_eval/spk2utt: No such file or directory

I tried to run the feature extraction script:

./run_feature.sh: line 18: data/la_eval/spk2utt: No such file or directory

Could you please help me?

Thank you very much

problem in main.py

How to generate these files in main.py
data_files = { # training
'train_scp': 'data_reader/feats/la_train_spec_tensor4.scp',
'train_utt2index': 'data_reader/utt2systemID/la_train_utt2index_8',
'dev_scp': 'data_reader/feats/la_dev_spec_tensor4.scp',
'dev_utt2index': 'data_reader/utt2systemID/la_dev_utt2index_8',
'dev_utt2systemID': 'data_reader/utt2systemID/la_dev_utt2systemID',
'scoring_dir': 'scoring/la_cm_scores/',
}

the missing bash script ‘make_spectrogram.sh’

@jefflai108 hi jeff, when i run feature extraction script 'run_feature.sh', I found 'local/make_spectrogram.sh: No such file or directory' , I wonder whether it's a bash script not included in kaldi? if so, can i replace it with the one included by kaldi inherently? In other words, any extra configuration is needed?

How to create .scp and utt2spk

I don't know how to create files above from the raw dataset.
run_feature.sh can't work without such files. Who can tell me how to reach the goal?
Thanks a lot!

Initialization has great impact on detection performance?

Dear author,

I really appreciate your codes for implementing the ASSERT model. It is an excellent work!!!
But in practice, I got some questions. I found that when I ran the scripts multiple times (with exact same settings), the results can be much different and had a large range. For e.g., I ran 10 times the SE-ResNet34 model for replay detetion task with your default settings, the best performance could archieve an EER of 0.67% for dev and EER of 1.11% for eval, but the worst one has an EER of 1.50% for dev and EER of 2.02% for eval. I am curious about the reason behind this. Does the model parameters initialization have such great impact? Or other reasons? How can I avoid this and make the training more stable? Could you please give some suggestions on it?
Thanks a lot!!!

How using python to get the logspec?

I use the code to extract logspec, but the feature is different from kaldi.
s = librosa.core.stft(x, n_fft=n_fft, win_length=win_length, hop_length=hop_length)
a = np.abs(s)**2
mag_T = librosa.power_to_db(a)

Question related to run_feature.sh

Hi @jefflai108 ,

I've successfully created scp file with your help.

To extract logspec used in the paper, I changed the $stage in run_feature.sh to 0 .
File local/make_spectrogram.sh is missing, so I copied the one from your AFN repo.

But it seems that feature extraction processes in make_spectrogram.sh does not work properly since the resource usage is almost zero. Is there anything wrong with me?

fix_data_dir.sh: kept all 134730 utterances.
fix_data_dir.sh: old files are kept in data/pa_eval/.backup
utils/copy_data_dir.sh: copied data from data/pa_eval to data/pa_eval_spec
utils/validate_data_dir.sh: Successfully validated data-directory data/pa_eval_spec
local/make_spectrogram.sh --fbank-config conf/spec.conf --nj 40 --cmd run.pl data/pa_eval_spec exp/make_spec /home/bill/ASSERT/features/logspec
utils/validate_data_dir.sh: Successfully validated data-directory data/pa_eval_spec
local/make_spectrogram.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
run.pl: 40 / 40 failed, log is in exp/make_spec/make_fbank_pa_eval_spec.*.log

And I have another question that what's the difference between spec and logspec, is the latter one better for the task?

Millions of thanks in advance