LLM Reading List

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution. pdf(Google deepmind) arXiv, 2023.
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning. pdf arXiv, 2023.
Scaling Instruction-Finetuned Language Models. pdf arXiv, 2022.
Automatic Chain of Thought Prompting in Large Language Models. pdf arXiv, 2023.
Multimodal Chain-of-Thought Reasoning in Language Models. pdf arXiv, 2023.
Design of a Chain-of-Thought in Math Problem Solving. pdf arXiv, 2023.
Large Language Models Are Human-Level Prompt Engineers. pdf ICLR, 2023.
ReAct: Synergizing Reasoning and Acting in Language Models. pdf ICLR, 2023.
Prompting Is Programming: A Query Language for Large Language Models. pdf PLDI 2023.
Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs. pdf arXiv, 2023.
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine. pdf arXiv, 2023.
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning. pdf, 2023

RARR: Researching and Revising What Language Models Say, Using Language Models. pdf arXiv, 2023.
Fundamental Limitations of Alignment in Large Language Models. pdf arXiv, 2023.
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. pdf arXiv, 2023.
Large Language Model Alignment: A Survey. pdf arXiv, 2023.
The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risk. pdf arXiv, 2023.
Identifying and Mitigating the Security Risks of Generative AI. pdf arXiv, 2023.
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning. pdf arXiv, 2023.
Chain-of-Verification Reduces Hallucination in Large Language Models. pdf arXiv, 2023.
Language Is Not All You Need: Aligning Perception with Language Models. pdf arXiv, 2023.

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher. pdf arXiv, 2023.
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models. pdf arXiv, 2023.
Visual Adversarial Examples Jailbreak Aligned Large Language Models. pdf arXiv, 2023.
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! pdf website arXiv, 2023.
JAILBREAKER: Automated Jailbreak Across Multiple Large Language Model Chatbots. pdf NDSS, 2024.
Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. pdf arXiv, 2023.
Multi-step Jailbreaking Privacy Attacks on ChatGPT. pdf arXiv, 2023.
Jailbroken: How Does LLM Safety Training Fail? pdf arXiv, 2023.
[workshop] On the Privacy Risk of In-context Learning. pdf arXiv, 2023.
Jailbreaking Black Box Large Language Models in Twenty Queries. pdf arXiv, 2023.
Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation. pdf arXiv, 2023.
Latent Jailbreak: A Benchmark for Evaluating Text Safety and Output Robustness of Large Language Models. pdf arXiv, 2023.
AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models. pdf arXiv, 2023.
"Open Sesame! Universal Black Box Jailbreaking of Large Language Models. pdf arXiv, 2023.

LAMBRETTA: Learning to Rank for Twitter Soft Moderation. pdf S&P, 2023.
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice.pdf arXiv, 2023.
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content. pdf S&P, 2024.
Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection. pdf ACL 2023.
Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning. pdf arXiv, 2023.
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? pdf arXiv, 2023.
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models. pdf arXiv, 2023.
[website] Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods link
Text Embeddings Reveal (Almost) As Much As Text. pdf EMNLP, 2023.

keyanub / llm-jailbreak Goto Github PK