Code Monkey home page Code Monkey logo

rsv's Introduction

Randomized Substitution and Vote (RS&V)

This repository contains code to reproduce results from the paper:

Detecting Textual Adversarial Examples through Randomized Substitution and Vote (UAI 2022)

Xiaosen Wang, Yifeng Xiong, Kun He

Datesets and Dependencies

There are three datasets used in our experiments. Download and put the dataset into the directory ./data/ag_news, ./data/imdb and ./data/yahoo_answers, respectively.

There are Three dependencies for this project. Download and put the files glove.840B.300d.txt and counter-fitted-vectors.txt into the directory ./data/vectors, put the directory stanford-postagger-2018-10-16/ into the directory ./data/aux_files.

You can run the get_data_and_dependencies.sh to get test data:

bash get_data_and_dependencies.sh

File Description

  • ./model: Detail code for model architecture.

  • ./utils: Helper functions for training models and processing data.

  • ./adversary: Files for attack methods.

  • ./data: Datasets and GloVe vectors.

  • cnn_classifier.py, bert_classifier.py, robert_classifier.py : Training code for CNN, bert and RoBERTa.

  • cnn_attack.py: Attacking CNN model.

  • bert_attack.py Attacking BERT and RoBERTa model.

  • build_embs.py: Generating the dictionary, embedding matrix and distance matrix.

  • synonym_selector.py: Generating synonyms set.

  • detect_transfer.py: Converting adversarial examples through Randomized Substitution.

  • detect_eval.py: Vote and Detection.

Experiments

  1. Generating the dictionary, embedding matrix and distance matrix:

    python build_embs.py --data_dir ./data/ --task_name ag_news
  2. Training and attacking the models:

    For CNN:

    python cnn_classifier.py --output_dir ./output/model_file/ag_news/cnn --data_dir ./data/ --task_name ag_news --max_seq_length 128 --do_train --do_eval --vGPU 0
    python cnn_attack.py  --output_dir ./output/model_file/ag_news/cnn  --data_dir ./data/ --attack textfooler --task_name ag_news --max_seq_length 128  --max_candidate 50 --save_to_file ./output/adv_example/ag_news_cnn_textfooler --vGPU 0

    For BERT:

    python bert_classifier.py  --output_dir ./output/model_file/ag_news/bert --bert_model bert-base-uncased  --data_dir ./data/  --task_name ag_news --max_seq_length 128  --do_train --do_eval  --vGPU 0
    python bert_attack.py --data_dir ./data/ --task_name ag_news --attack textfooler --output_dir ./output/model_file/ag_news/bert/ --attack_batch 1000 --save_to_file ./output/adv_example/ag_news --bert_model bert-base-uncased  --max_candidate 50 --max_seq_length 128 --vGPU 0
  3. Evaluating the detection performance:

    python detect_transfer.py --task_name ag_news --data_dir ./data/  --votenum 25 --randomrate 0.6 --fixrate 0.02 --advfile ./output/adv_example/ag_news_cnn_textfooler.pkl --out_file ./output/transfer/ag_news_cnn_textfooler.pkl
    python detect_eval.py --task_name ag_news --data_dir ./data/  --max_seq_length 128  --modeltype cnn --output_dir ./output/model_file/ag_news/cnn --eval_file ./output/transfer/ag_news_cnn_textfooler.pkl
    

Contact

Questions and suggestions can be sent to [email protected].

rsv's People

Contributors

jhl-hust avatar xiaosen-wang avatar

Stargazers

Yang Wang avatar ZhuHe avatar  avatar

Watchers

 avatar

rsv's Issues

FileNotFoundError : org_dic_ag_news

even i run get_dependencies.sh but i have error of not finding ./data/aux_files/org_dic_ag_news_50000.pkl which mus be loaded while training the cnn classifiere

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.