Code Monkey home page Code Monkey logo

optml-group / diffusion-mu-attack Goto Github PK

View Code? Open in Web Editor NEW
29.0 1.0 2.0 11.79 MB

The official implementation of the paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now". This work introduces one fast and effective attack method to evaluate the harmful-content generation ability of safety-driven unlearned diffusion models.

License: MIT License

Python 99.69% Shell 0.31%
attack-unlearned-diffusion-model stable-diffusion unlearning adversarial-attacks evaluation-framework robustness

diffusion-mu-attack's Introduction

To Generate or Not?
Safety-Driven Unlearned Diffusion Models
Are Still Easy To Generate Unsafe Images
... For Now

Welcome to the official implementation of UnlearnDiff Attack, which capitalizes on the intrinsic classification abilities of DMs to simplify the creation of adversarial prompts, thereby eliminating the need for auxiliary classification or diffusion models.Through extensive benchmarking, we evaluate the robustness of five widely-used safety-driven unlearned DMs (i.e., DMs after unlearning undesirable concepts, styles, or objects) across a variety of tasks.

Image 1

Abstract

The recent advances in diffusion models (DMs) have revolutionized the generation of complex and diverse images. However, these models also introduce potential safety hazards, such as the produc- tion of harmful content and infringement of data copyrights. Although there have been efforts to create safety-driven unlearning methods to counteract these challenges, doubts remain about their capabilities. To bridge this uncertainty, we propose an evaluation framework built upon adversarial attacks (also referred to as adversarial prompts), in order to discern the trustworthiness of these safety-driven unlearned DMs. Specifically, our research explores the (worst-case) robustness of un- learned DMs in eradicating unwanted concepts, styles, and objects, assessed by the generation of adversarial prompts. We develop a novel adversarial learning approach called UnlearnDiff that leverages the inherent classification capabilities of DMs to streamline the generation of adversarial prompts, making it as simple for DMs as it is for image classification attacks. This technique stream- lines the creation of adversarial prompts, making the process as intuitive for generative modeling as it is for image classification assaults. Through comprehensive benchmarking, we assess the unlearning robustness of five prevalent unlearned DMs across multiple tasks. Our results underscore the effec- tiveness and efficiency of UnlearnDiff when compared to state-of-the-art adversarial prompting methods

Code Structure

configs: contains the default parameter for each methods

prompts: contains the prompts we selected for each experiments

src: contains the source code for the proposed methods

  • attackers: contains different attack methods (different discrete optimization methods)
  • tasks: contains different type of attacks (auxiliary model-based attacks P4D, and ours UnlearnDiff)
  • execs: contains the main execution files to run experiments
  • loggers: contains the logger codes for the experiments

Usage

In this section, we provide the instructions to reproduce the results on nudity (ESD) in our paper. You can change the config file path to reproduce the results on other concepts or unlearned models.

Requirements

conda env create -n ldm --file environments/x86_64.yaml

Unlearned model preparation

We provide different unlearned models (ESD and FMN), and you can download them from [Object , Others]. We also provide an Artist classifier for evaluating the style task. You can download it from here.

Generate dataset

python src/execs/generate_dataset.py --prompts_path prompts/nudity.csv --concept i2p_nude --save_path files/dataset

No attack

python src/execs/attack.py --config-file configs/nudity/no_attack_esd_nudity_classifier.json --attacker.attack_idx $i --logger.name attack_idx_$i

where i is from [0,142)

UnlearnDiff attack

python src/execs/attack.py --config-file configs/nudity/text_grad_esd_nudity_classifier.json --attacker.attack_idx $i --logger.name attack_idx_$i

where i is from [0,142)

Evaluation

For nudity/violence/illegal/objects:

python scripts/analysis/check_asr.py --root-no-attack $path_to_no_attack_results --root $path_to_${P4D|UnlearnDiff}_results

For style:

python scripts/analysis/style_analysis.py --root $path_to_${P4D|UnlearnDiff}_results --top_k {1|3}

Citation

@article{zhang2023generate,
  title={To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images... For Now},
  author={Zhang, Yimeng and Jia, Jinghan and Chen, Xin and Chen, Aochuan and Zhang, Yihua and Liu, Jiancheng and Ding, Ke and Liu, Sijia},
  journal={arXiv preprint arXiv:2310.11868},
  year={2023}
}

Related Works - Machine Unlearning

diffusion-mu-attack's People

Contributors

damon-demon avatar jinghanjia avatar ljcc0930 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

diffusion-mu-attack's Issues

About multiple-gpus

Hi, do you remember me. Sorry to bother you again. I am a beginner of stable diffusion model and have some problems when running your code. Did you use a single gpu? I encountered CUDA out of memory with a single 3090-24GB and I could not fit the code into 2 or more gpus due to a lack of multiple-gpus' experience. Thank you very much for answering my rookie questions both this and last time.

Threshold for obtaining nudity.csv

Hello, Author!
Sorry for reaching out again. I wanted to inquire about the threshold you mentioned here for obtaining the "nudity.csv". Thank you so much for your assistance!

Please provide the model

Hi, it seems that you did not provide models from ESD and FMN because the link you placed is still a code repository rather than a model. Thx.

Could you provide unlearned models for objects?

Hi~ The work is awesome! I am trying to reproduce the results. I notice that the provided models in the google drive link does not include unlearned models for objects such as church, parachute, tench, and garbage truck. Could you please provide these models? Thank you very much.

The relation between i2p.csv and nudity.csv

Hi, what an amzing work! I am so lucky to reproduce your code and btw I am confused about the relation between i2p.csv and nudity.csv. I see that if you select all the "sexual" categories from i2p.csv, you will have 930+ prompts but there only 140+ prompts in nudity.csv. Could you help me to figure it out? I will appreciate it! :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.