Code Monkey home page Code Monkey logo

llm-sp's People

Contributors

charliejcj avatar chawins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm-sp's Issues

Kindly Request the Inclusion

Hi Chawin,

Just wanted to say a big thanks for all the awesome stuff you've been doing for the community. Your recent paper on the black-box jailbreaking attack was super interesting – really enjoyed reading it!! It's really excited to see that the hybrid attacks (combining query-based + proxy models) remain effective in jailbreaking.

I was wondering if you might take a look at our paper and add to your list, "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models." It's a ICLR24 paper and not exactly new, but it's been doing well in some of the open-source benchmarks, like CAIS's Harmbench.

Thanks a ton for considering it. Looking forward to any opportunity to chat more!

Kindly request the inclusion

I'm reaching out to share a recent paper I've co-authored titled "DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers". Our research focuses on jailbreaking LLM by prompt decomposition, and I believe it aligns well with your interest in LLM safety.

You can access the paper here. Our project page and twitter message are also available for your reference.

Thank you so much for considering my request. I'm also open to any questions or discussions this might spark – I'd love to engage in a meaningful conversation with someone of your expertise.

Best regards,
Xirui Li

Kindly Add a recent work

Hi, I would like to add our completed paper from MSFT Research about defense against adversarial attacks, "Protecting Your LLMs with Information Bottleneck" paper , thanks!

Add new paper

论文“Tree of Attacks: Jailbreaking Black-Box LLMs Automatically”
在“Jailbreaking Black Box Large Language Models in Twenty Queries”这篇工作上做了细微的改进

Kindly Request the Inclusion of My Work

Hi,

Many thanks for your effort!

This list contains numerous articles that I also find appealing, as well as some that I have not read yet but their titles have caught my interest. May I kindly request the inclusion of my paper in the list? We proposed a new safety training method against jailbreak attacks, named Self-Guard.

We have restructured the paper based on the current preprint version, adding new experimental results, including ablation experiments. Unfortunately, due to the anonymity period, I am unable to update the preprint version in time. Although the publicly available version of our work is not perfect at present, please rest assured that we have made numerous updates, and we believe it will be beneficial to the LLM safety community. We will promptly update our paper once the anonymity period is lifted. My only wish is for my paper to be included in such a splendid list, not to promote it.

Of course, I am merely seeking your consent. Please do not feel pressured, as you are entirely free to decline my request and colse this issue. Regardless of the results, we will continue to refine our work.

Once again, I would like to express my gratitude for your efforts and contributions.

Best,
Zezhong

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.