chawins / llm-sp Goto Github PK
View Code? Open in Web Editor NEWPapers and resources related to the security and privacy of LLMs 🤖
Home Page: https://chawins.github.io/llm-sp
License: Apache License 2.0
Papers and resources related to the security and privacy of LLMs 🤖
Home Page: https://chawins.github.io/llm-sp
License: Apache License 2.0
Hi Chawin,
Just wanted to say a big thanks for all the awesome stuff you've been doing for the community. Your recent paper on the black-box jailbreaking attack was super interesting – really enjoyed reading it!! It's really excited to see that the hybrid attacks (combining query-based + proxy models) remain effective in jailbreaking.
I was wondering if you might take a look at our paper and add to your list, "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models." It's a ICLR24 paper and not exactly new, but it's been doing well in some of the open-source benchmarks, like CAIS's Harmbench.
Thanks a ton for considering it. Looking forward to any opportunity to chat more!
I'm reaching out to share a recent paper I've co-authored titled "DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers". Our research focuses on jailbreaking LLM by prompt decomposition, and I believe it aligns well with your interest in LLM safety.
You can access the paper here. Our project page and twitter message are also available for your reference.
Thank you so much for considering my request. I'm also open to any questions or discussions this might spark – I'd love to engage in a meaningful conversation with someone of your expertise.
Best regards,
Xirui Li
Hi, I would like to add our completed paper from MSFT Research about defense against adversarial attacks, "Protecting Your LLMs with Information Bottleneck" paper , thanks!
论文“Tree of Attacks: Jailbreaking Black-Box LLMs Automatically”
在“Jailbreaking Black Box Large Language Models in Twenty Queries”这篇工作上做了细微的改进
Hi,
Many thanks for your effort!
This list contains numerous articles that I also find appealing, as well as some that I have not read yet but their titles have caught my interest. May I kindly request the inclusion of my paper in the list? We proposed a new safety training method against jailbreak attacks, named Self-Guard.
We have restructured the paper based on the current preprint version, adding new experimental results, including ablation experiments. Unfortunately, due to the anonymity period, I am unable to update the preprint version in time. Although the publicly available version of our work is not perfect at present, please rest assured that we have made numerous updates, and we believe it will be beneficial to the LLM safety community. We will promptly update our paper once the anonymity period is lifted. My only wish is for my paper to be included in such a splendid list, not to promote it.
Of course, I am merely seeking your consent. Please do not feel pressured, as you are entirely free to decline my request and colse this issue. Regardless of the results, we will continue to refine our work.
Once again, I would like to express my gratitude for your efforts and contributions.
Best,
Zezhong
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.