chawins / llm-sp Goto Github PK

View Code? Open in Web Editor NEW

231.0 14.0 13.0 273 KB

Papers and resources related to the security and privacy of LLMs 🤖

Home Page: https://chawins.github.io/llm-sp

License: Apache License 2.0

Python 100.00%

adversarial-machine-learning awesome-list llm llm-privacy llm-security privacy security

llm-sp's People

Contributors

Stargazers

Watchers

Forkers

charliejcj pengfeihepower gregxmhu fro-oo pandora-alias yanmingong zggg1p superf0sh inzy ravensanstete harishgovardhandamodar yoyostudy haozhenzhao

llm-sp's Issues

Kindly Request the Inclusion

Hi Chawin,

Just wanted to say a big thanks for all the awesome stuff you've been doing for the community. Your recent paper on the black-box jailbreaking attack was super interesting – really enjoyed reading it!! It's really excited to see that the hybrid attacks (combining query-based + proxy models) remain effective in jailbreaking.

I was wondering if you might take a look at our paper and add to your list, "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models." It's a ICLR24 paper and not exactly new, but it's been doing well in some of the open-source benchmarks, like CAIS's Harmbench.

Thanks a ton for considering it. Looking forward to any opportunity to chat more!

Kindly request the inclusion

I'm reaching out to share a recent paper I've co-authored titled "DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers". Our research focuses on jailbreaking LLM by prompt decomposition, and I believe it aligns well with your interest in LLM safety.

You can access the paper here. Our project page and twitter message are also available for your reference.

Thank you so much for considering my request. I'm also open to any questions or discussions this might spark – I'd love to engage in a meaningful conversation with someone of your expertise.

Best regards,
Xirui Li

Kindly Add a recent work

Hi, I would like to add our completed paper from MSFT Research about defense against adversarial attacks, "Protecting Your LLMs with Information Bottleneck" paper , thanks!

Add new paper

论文“Tree of Attacks: Jailbreaking Black-Box LLMs Automatically”
在“Jailbreaking Black Box Large Language Models in Twenty Queries”这篇工作上做了细微的改进

Kindly Request the Inclusion of My Work

Hi,

Many thanks for your effort!

This list contains numerous articles that I also find appealing, as well as some that I have not read yet but their titles have caught my interest. May I kindly request the inclusion of my paper in the list? We proposed a new safety training method against jailbreak attacks, named Self-Guard.

Title: Self-Guard: Empower the LLM to Safeguard Itself
Link: https://arxiv.org/abs/2310.15851

We have restructured the paper based on the current preprint version, adding new experimental results, including ablation experiments. Unfortunately, due to the anonymity period, I am unable to update the preprint version in time. Although the publicly available version of our work is not perfect at present, please rest assured that we have made numerous updates, and we believe it will be beneficial to the LLM safety community. We will promptly update our paper once the anonymity period is lifted. My only wish is for my paper to be included in such a splendid list, not to promote it.

Of course, I am merely seeking your consent. Please do not feel pressured, as you are entirely free to decline my request and colse this issue. Regardless of the results, we will continue to refine our work.

Once again, I would like to express my gratitude for your efforts and contributions.

Best,
Zezhong

chawins / llm-sp Goto Github PK

llm-sp's People

Contributors

Stargazers

Watchers

Forkers

llm-sp's Issues

Kindly Request the Inclusion

Kindly request the inclusion

Kindly Add a recent work

Add new paper

Kindly Request the Inclusion of My Work

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent