-
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution. pdf(Google deepmind) arXiv, 2023.
-
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning. pdf arXiv, 2023.
-
Scaling Instruction-Finetuned Language Models. pdf arXiv, 2022.
-
Automatic Chain of Thought Prompting in Large Language Models. pdf arXiv, 2023.
-
Multimodal Chain-of-Thought Reasoning in Language Models. pdf arXiv, 2023.
-
Design of a Chain-of-Thought in Math Problem Solving. pdf arXiv, 2023.
-
Large Language Models Are Human-Level Prompt Engineers. pdf ICLR, 2023.
-
ReAct: Synergizing Reasoning and Acting in Language Models. pdf ICLR, 2023.
-
Prompting Is Programming: A Query Language for Large Language Models. pdf PLDI 2023.
-
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs. pdf arXiv, 2023.
-
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine. pdf arXiv, 2023.
-
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning. pdf, 2023
-
RARR: Researching and Revising What Language Models Say, Using Language Models. pdf arXiv, 2023.
-
Fundamental Limitations of Alignment in Large Language Models. pdf arXiv, 2023.
-
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. pdf arXiv, 2023.
-
Large Language Model Alignment: A Survey. pdf arXiv, 2023.
-
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risk. pdf arXiv, 2023.
-
Identifying and Mitigating the Security Risks of Generative AI. pdf arXiv, 2023.
-
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning. pdf arXiv, 2023.
-
Chain-of-Verification Reduces Hallucination in Large Language Models. pdf arXiv, 2023.
-
Language Is Not All You Need: Aligning Perception with Language Models. pdf arXiv, 2023.
-
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher. pdf arXiv, 2023.
-
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models. pdf arXiv, 2023.
-
Visual Adversarial Examples Jailbreak Aligned Large Language Models. pdf arXiv, 2023.
-
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! pdf website arXiv, 2023.
-
JAILBREAKER: Automated Jailbreak Across Multiple Large Language Model Chatbots. pdf NDSS, 2024.
-
Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. pdf arXiv, 2023.
-
Multi-step Jailbreaking Privacy Attacks on ChatGPT. pdf arXiv, 2023.
-
Jailbroken: How Does LLM Safety Training Fail? pdf arXiv, 2023.
-
[workshop] On the Privacy Risk of In-context Learning. pdf arXiv, 2023.
-
Jailbreaking Black Box Large Language Models in Twenty Queries. pdf arXiv, 2023.
-
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation. pdf arXiv, 2023.
-
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models. pdf arXiv, 2023.
-
AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models. pdf arXiv, 2023.
-
"Open Sesame! Universal Black Box Jailbreaking of Large Language Models. pdf arXiv, 2023.
-
LAMBRETTA: Learning to Rank for Twitter Soft Moderation. pdf S&P, 2023.
-
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice.pdf arXiv, 2023.
-
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content. pdf S&P, 2024.
-
Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection. pdf ACL 2023.
-
Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning. pdf arXiv, 2023.
-
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? pdf arXiv, 2023.
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models. pdf arXiv, 2023.
-
[website] Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods link
-
Text Embeddings Reveal (Almost) As Much As Text. pdf EMNLP, 2023.