Code Monkey home page Code Monkey logo

www-project-top-10-for-large-language-model-applications's Introduction

pages-build-deployment

www-project-top-10-for-large-language-model-applications

OWASP Foundation Web Repository

OWASP Top 10 for Large Language Model Applications

Current version in-flight OWASP Lab Status project License: CC BY-SA 4.0 llmtop10.com

Welcome to the official repository for the OWASP Top 10 for Large Language Model Applications!

Overview and Audience 🗣️

The OWASP Top 10 for Large Language Model Applications is a standard awareness document for developers and web application security. It represents a broad consensus about the most critical security risks to Large Language Model (LLM) applications. There are other ongoing frameworks both inside and outside of OWASP that are not to be confused with this project and is currently scoped towards only LLM Application Security.

Our primary audience is developers, data scientists, and security experts tasked with designing and building applications and plugins leveraging LLM technologies. We aim to provide practical, actionable, and concise security guidance to help these professionals navigate the complex and evolving terrain of LLM application security.

Key Focus 📖

The primary aim of this project is to provide a comprehensible and adoptable guide to navigate the potential security risks in LLM applications. Our Top 10 list serves as a starting point for developers and security professionals who are new to this domain, and as a reference for those who are more experienced.

Mission Statement 🚀

Our mission is to make application security visible, so that people and organizations can make informed decisions about application security risks related to LLMs. While our list shares DNA with vulnerability types found in other OWASP Top 10 lists, we do not simply reiterate these vulnerabilities. Instead, we delve into these vulnerabilities’ unique implications when encountered in applications utilizing LLMs.

Our goal is to bridge the divide between general application security principles and the specific challenges posed by LLMs. The group’s goals include exploring how conventional vulnerabilities may pose different risks or be exploited in novel ways within LLMs and how developers must adapt traditional remediation strategies for applications utilizing LLMs.

Contribution 👋

The first version of this list was contributed by Steve Wilson of Contrast Security. We encourage the community to contribute and help improve the project. If you have any suggestions, feedback or want to help improve the list, feel free to open an issue or send a pull request.

We have a working group channel on the OWASP Slack, so please sign up and then join us on the #project-top10-llm channel.

Please hop over to our wiki page to collaborate on the project and stay up to date with the latest meetings and current roadmap.

License

This project is licensed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License.

Star History

Star History Chart

www-project-top-10-for-large-language-model-applications's People

Contributors

bobsimonoff avatar cybershujin avatar dewankpant avatar emmanuelgjr avatar ganggreentempertatum avatar gtklondike avatar hblankenship avatar homeroast avatar jsotiro avatar kenhuangus avatar leondz avatar manjesh24 avatar matowasp avatar mhupfauer avatar nakanaka101 avatar okdt avatar owaspfoundation avatar rfc-st avatar rossja avatar rot169 avatar sclinton avatar shikida avatar sn4kecharmer avatar talesh avatar v4fs avatar vabmalikusa avatar virtualsteve-star avatar willchilcutt avatar xsankar avatar yodahash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

www-project-top-10-for-large-language-model-applications's Issues

Disambiguate Supply Chain Vulnerability from Training Data Poisoning

I'd like to see some verbiage in the definition of the vulnerability (and possibly the summary) that better separate supply chain vulnerability from training data poisoning. If the training data poisoning occurs with a third-party models or data, it seems supply chain vulnerability is applicable. If it is training data or a model built in the house, training data poisoning makes sense. I think we want to reduce the overlap n the description of vulnerabilities and their examples, else reporting, tracking and maintaining a top 10 will become difficult.

Sample examples that are difficult to singularly categorize (this is a small subset).

LLM03.1 A malicious actor, or a competitor brand intentionally creates inaccurate or malicious documents which are targeted at a model’s training data. The victim model trains using falsified information which is reflected in outputs of generative AI prompts to it's consumers. Vulnerability = Training data poisoning, supply chain vulnerability [if third-party model]

LLM03.2 A model is trained using data which has not been verified by its source, origin or content. Vulnerability = Training data poisoning, supply chain vulnerability [if third-party model]

LLM03A.1 The LLM generative AI prompt output can mislead users of the application which can lead to biased opinions, followings or even worse, hate crimes etc Vulnerability = Training data poisoning, supply chain vulnerability [if third-party model], Overreliance

Recommend changes to overreliance

Please consider adding the following to the introduction

Overreliance differs from Insecure Output Handling in that Insecure Output Handling focuses on the lack of proper validation and sanitization of LLM-generated outputs before they are passed downstream components whereas Overreliance refers more broadly to overdependence on the accuracy and appropriateness of LLM outputs.

Recomend rewording example to not call out a particular company

Also add a little clarification

  1. Clearly separate and label any external or untrusted content used in prompts passed to the LLM. This allows the model to distinguish influencers like user prompts versus unvalidated external sources. Segregating these input sources may limit the ability of malicious external content to manipulate or inject unintended behavior into the LLM's prompt interpretations. This mitigation alone, however, is easily jailbroken, so must be used in conjunction with other mitigations.

Excessive Agency Spelling/Grammar

The root cause of Excessive Agency is typically one or more of: excessive functionality, excessive permissions or excessive autonomy.

(remove :)
The root cause of Excessive Agency is typically one or more of excessive functionality, excessive permissions or excessive autonomy.

Alternatively, a plugin may have been trialled during a development phase and dropped in favour of a better alternative, but the original plugin remains available to the LLM agent.

(spelling and American English?)
Alternatively, a plugin may have been trialed during a development phase and dropped in favor of a better alternative, but the original plugin remains available to the LLM agent.

(e.g., run a shell command, fetch a URL, etc)

(Nitpick etc. not etc)
(e.g., run a shell command, fetch a URL, etc.)

Limit the permissions that LLM plugins/tools are granted to other systems the minimum necessary in order to limit the scope of undesirable actions.

(added 'to' and deleted unnecessary 'in order')
Limit the permissions that LLM plugins/tools are granted to other systems to the minimum necessary to limit the scope of undesirable actions.

For example, an LLM agent that uses a product database in order to make purchase re

(delete unnecessary 'in order')
For example, an LLM agent that uses a product database to make purchase re...

An LLM-based personal assistant app is granted access to an individual’s mailbox via a plugin in order to summarise

(eliminate unnecessary 'in order' and switch to American spelling?)
An LLM-based personal assistant app is granted access to an individual’s mailbox via a plugin in order to summarize

Template and Risk Pages style change

Can we change the title

from

Vulnerability Name

to either

LLMxx: Risk Name

or

LLMxxL Vulnerability Name

Point being I would like to switch from vulnerability nomenclature to risk nomenclature but that's a battle for a different issue. This one is really focusing on getting LLM in the title so we don't always have to go to the PDF to figure it out

Insecure Output Handling Spelling/Grammar

A user utilizes a website summarizer tool powered by a LLM to generate

(an)
A user utilizes a website summarizer tool powered by an LLM to generate

If the crafted query from the LLM is not scrutinized, then all database tables would be deleted.

(recommend will instead of would)
If the crafted query from the LLM is not scrutinized, then all database tables will be deleted.

A malicious user instructs the LLM to return a JavaScript payload back to a user, without sanitization controls.

(I think back is unnecessary)
A malicious user instructs the LLM to return a JavaScript payload to a user, without sanitization controls.

This can occur either through a sharing a prompt, prompt injected website, or chatbot that accepts prompts from a URL parameter.

(recommend parallel list structure)
This can occur either through sharing a prompt, a prompt injected website, or a chatbot that accepts prompts from a URL parameter.

The LLM would then return the unsanitized XSS payload back to the user.

(unnecessary back)
The LLM would then return the unsanitized XSS payload to the user.

Possible addition to prompt injection "mitigations"

The following options will not prevent Prompt Injection, but can help in identifying and resolving problems

  1. Utilize LLM monitoring tools that track user prompts and associate them with backend activity including LLM responses. While not a mitigation, this can provide data needed to detect weaknesses and address them.

Example in Overreliance should be 2 examples

Thing I strongly believe us at the examples should concisely and precisely help educate about the vulnerability. The following seems as though it should be broken into two examples for better clarity:

  1. A news organization heavily uses an AI model to generate news articles. A malicious actor exploits this over-reliance, feeding the AI misleading information, causing the spread of disinformation. The AI unintentionally plagiarizes content, leading to copyright issues and decreased trust in the organization.

Add privacy concern when training model using PII and other sensitive data#LLM06

  1. When any user tries to train the model with professional or personal data could lead to privacy concerns where the model records the data and may breach the privacy of the user.

  2. Data exfiltration: While sharing the model with other users or customers - many presets of data are also shared with the other user which could lead to exfiltration of information so when you share the model make sure you put some human intervention or proper sanitization of data to filter the sensitive information.

  3. https://www.private-ai.com/2023/01/18/addressing-privacy-and-the-gdpr-in-chatgpt-and-large-language-models/

Consider changes/updates to insecure plugin design

Addition to the intro:
Since plugins are, under normal circumstances, accessed only by the LLM, exploitation is typically a result of another vulnerability such as excessive agency or direct or indirect prompt injection. However, plugins are still responsible for protecting themselves since side-channel attacks can still occur.

Possibly updates to Common Examples:

  1. A plugin accepts its input parameters in a single text field instead of distinct input parameters that can be validated and sanitized.
  2. A plugin accepts raw SQL or programming statements, which are more difficult to validate than distinct parameters.
  3. A plugin adheres to inadequate fine grained authorization controls.
  4. A plugin blindly trusts that the LLM output, which is the input to the plugin, correctly represents the expected output for the initial prompt.
  5. A plugin treats all LLM content as being created entirely by the user and performs any requested actions without requiring additional authorization.

Possiblr changes to attack scenarios
2. A plugin used to retrieve embeddings from a vector store accepts configuration parameters as a connection string without any validation. This allows an attacker to experiment and access other vector stores by changing names or host parameters and exfiltrate embeddings they should not have access to.
3. A plugin accepts SQL WHERE clauses as advanced filters, which are then appended to the filtering SQL. This allows an attacker to stage a SQL attack.
4. An attacker uses indirect prompt injection to exploit an insecure code management plugin that has no input validation and weak access control to transfer repository ownership and lock out the user from their repositories.

Website Title: OWASP Top 10 for LLM should be OWASP Top 10 for LLM Applications

I got myself confused because I'm seeing a lot of places where our project is referred to as "OWASP Top 10 for LLM" instead of "OWASP Top 10 for LLM Applications", one of which is the website, which is what I am reporting here.

As we discover other places with an incomplete name, we should fix them because the implications of security issues for LLMs versus security issues for LLM applications is very different.

Clarify the description in LLM02

Background
In the LLM02, the following wording is stated:
"The application is vulnerable to external prompt injection attacks, which could allow an
attacker to gain privilege` access to a target user's environment
"

Suggested change
I assume that what author had in mind was indirect prompt injection, I think that the following can be rephrased to:
"The application is vulnerable to indirect prompt injection attacks, which could allow an
attacker to gain privilege` access to a target user's environment
"

PDF page with summary of vulnerabilities has partial title - it is missing "Applications"

I got myself confused because I'm seeing a lot of places where our project is referred to as "OWASP Top 10 for LLM" instead of "OWASP Top 10 for LLM Applications", one of which is the website, which is what I am reporting here.

As we discover other places with an incomplete name, we should fix them because the implications of security issues for LLMs versus security issues for LLM applications is very different.

image

Clean up of Description and Examples in Excessive Agency

My goal is to better delineate the vulnerabilities from each other and make examples clear and focused on documenting the vulnerability and delineating it from others.

Description

The following description of Excessive Agency makes it appear more like Insecure Output Handling since "Insecure Output Handling is a vulnerability that arises when a downstream component blindly accepts large language model (LLM) output without proper scrutiny":

Excessive Agency is the vulnerability that enables damaging actions to be performed in response to unexpected/ambiguous outputs from an LLM.

These vulnerabilities could clearly distinguish themselves from each other by adhering to the following: Excessive Agency relates to excessive capabilities granted to the LLM regardless of outputs whereas Insecure Output Handling arises from poor scrutiny of LLM outputs specifically.

Examples of Vulnerability

On Example of Vulnerability is the following which seems like it could be confused with Insecure Output Handling since the downstream component is "blindly accepting LLM output without proper scrutiny." This example might be better classified as Insecure Plugin Design. If it were reworded to be a configuration issue, then Excessive Agency may be the better choice.

  1. Excessive Functionality: An LLM plugin with open-ended functionality fails to properly filter the input instructions for commands outside what's necessary for the intended operation of the application. E.g., a plugin to run one specific shell command fails to properly prevent other shell commands from being executed.

Example 5 seems to also align with Insecure Plugin Design.

  1. Excessive Autonomy: An LLM-based application or plugin fails to independently verify and approve high-impact actions. E.g., a plugin that allows a user's documents to be deleted performs deletions without any confirmation from the user.

Example Attack Scenarios

The example attack scenario reads as follows:

An LLM-based personal assistant app is granted access to an individual's mailbox via a plugin in order to summarise the content of incoming emails. To achieve this functionality, the email plugin requires the ability to read messages, however the plugin that the system developer has chosen to use also contains functions for sending messages. The LLM is vulnerable to an indirect prompt injection attack, whereby a maliciously-crafted incoming email tricks the LLM into commanding the email plugin to call the 'send message' function to send spam from the user's mailbox. This could be avoided by: (a) eliminating excessive functionality by using a plugin that only offered mail-reading capabilities, (b) eliminating excessive permissions by authenticating to the user's email service via an OAuth session with a read-only scope, and/or (c) eliminating excessive autonomy by requiring the user to manually review and hit 'send' on every mail drafted by the LLM plugin. Alternatively, the damage caused could be reduced by implementing rate limiting on the mail-sending interface.

This example could also be slotted into Prompt Injection, as is stated in the example. A simpler, more to the point example or even multiple focused examples may be more helpful in explaining this vulnerability and distinguishing it from others.

llmtop10.com: minor issue with menu button on mobile

Hi, thank you for launching this fantastic initiative. Just wanted to report a minor bug on llmtop10.com — hope this is this the right place.

On mobile or narrow screens, the menu button is not visible. It seems that the button is functional, but the font color matches the background making it not visible:

current

What I’d expect:
correct

If the website sources are available somewhere I’m happy to open a PR.

ACCEPTED - LLM10: Consider adding "Use a centralized ML Model Inventory or Registry" as a Prevention under LLM10

Hi Team,

Congrats on the first iteration!

I was hunting for either a "registry" or "inventory" word on the document and could not find one.

Based on my experience in threat modeling internally-developed ML/LLM applications, development teams are prone to storing the actual ML model or bits and pieces of related artifacts (documents, code, and other data that can be used to understand the purpose of the model) on their local machine, personal code repo, etc., contributing to LLM10 - Model Theft.

Under Prevention, I would like to propose adding the following entry:

  • Use a centralized ML Model Inventory or Registry for ML models used in production. Having a centralized model registry prevents unauthorized access to ML Models via access controls, authentication, and monitoring/logging capability which are good foundations for governance. Having a centralized repository is also beneficial for collecting data about algorithms used by the models for the purposes of compliance, risk assessments, and risk mitigation.

Examples of centralized ML model registries include MLFlow, Iterative.ai, GCP Vertex AI Model Registry, and AWS SageMaker Model Registry.

References:

  1. "The Current State of AI Governance" Report by Babl AI, Inc. & The Algorithmic Bias Lab
  2. " Model Management section - Machine Learning for High-Risk Applications: Techniques for Responsible AI" by Hall, Curtis, & Pandey. O'Reilly Press April 2023.
  3. AML.TA0000 ML Model Access - MITRE ATLAS™ (Adversarial Threat Landscape for Artificial-Intelligence Systems) ATT&CK Framework

(2) Website Typos + Proposed Fixes

Issue 1

# Where: https://llmtop10.com/
# What: In LLM06's "feature box" there is a typo inconsistent with other assets / grammar.
# Current:

LLM’s may reveal confidential data...”

# Proposed:

LLMs (no apostrophe) may reveal confidential data...

# Justification:
Grammar / continuity (to match the PDF, Whitepaper, and Google Doc which all contain "LLMs"; the plural of LLM, without an apostrophe, for LLM06.

Issue 2

# Where: https://llmtop10.com/intro/
# What: In the 1st sentence there is a typo.
# Current:

“The frenzy of interest of Large Language Models (LLMs) following of mass-market pre- trained chatbots in late 2022 has been remarkable.”

# Proposed:

The frenzy of interest in Large Language Models (LLMs) following the mass marketing of pretrained chatbots in late 2022 has been remarkable.

# Justification:
Typo impacting clarity.

Notes:

  1. Potential labels: "bug" + "website" —PS: It's my understanding I can't assign a "Label" because I am not a repo "Contributor", or else I'd add the labels; please let me know if that's not the case.
  2. Great project / community, hope this helps. ❤️

Proposed new vulnerability summaries for the pdf file

As wait for find the definitions of the vulnerabilities, I thought it made sense to review the 1 to 2 sentence summary is that we have in the PDF file. I also thought possibly making the structure tone and grammar parallel between the summaries made sense, so here's my attempt for your review

Sensitive Information Disclosure:

Proposal: Unpredictable LLM outputs can reveal sensitive information supplied by users or derived from training data, violating confidentiality.
Original: LLM’s may inadvertently reveal confidential data in its responses, leading to unauthorized data access, privacy violations, and security breaches. It’s crucial to implement data sanitization and strict user policies to mitigate this.

Excessive Agency:

Proposal: Granting an LLM excessive autonomy to take actions without sufficient guardrails enables it to cause harm based on unreliable outputs.
Original: LLM-based systems may undertake actions leading to unintended consequences. The issue arises from excessive functionality, permissions, or autonomy granted to the LLM-based systems.

Insecure Output Handling:

Proposal: Accepting LLM output without scrutiny exposes backend systems, allowing access to functions and data potentially leading to XSS, CSRF, SSRF, privilege escalation, remote code execution, etc.
Original: This vulnerability occurs when an LLM output is accepted without scrutiny, exposing backend systems. Misuse may lead to severe consequences like XSS, CSRF, SSRF, privilege escalation, or remote code execution.

Insecure Plugin Design:

Proposed: LLM plugins may unknowingly process untrusted inputs leading to harmful consequences from code execution to data theft.
Original: LLM plugins can have insecure inputs and insufficient access control. This lack of application control makes them easier to exploit and can result in consequences like remote code execution.

Overreliance:

Proposed: Overdependence on LLMs for critical functions without sufficient oversight risks issues from misinformation to operational failures.
Original: Systems or people overly depending on LLMs without oversight may face misinformation, miscommunication, legal issues, and security vulnerabilities due to incorrect or inappropriate content generated by LLMs.

Prompt Injection:

Proposed: Attackers can manipulate LLMs via crafted inputs to solicit information, influence decisions, or access systems.
Original: This manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.

Supply Chain Vulnerability:

Proposed: Compromise of third-party LLM application dependencies such as datasets, models, plugins and agents can undermine integrity and introduce risks.
Original: LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks. Using third-party datasets, pre-trained models, and plugins can add vulnerabilities.

Training Data Poisoning:

Proposed: Manipulation of training data can impair model capabilities or introduce biases, backdoors and vulnerabilities.
Original: This occurs when LLM training data is tampered, introducing vulnerabilities or biases that compromise security, effectiveness, or ethical behavior. Sources include Common Crawl, WebText, OpenWebText, & books.

Consider updates to insecure output handling

Please consider this adjustment to the intro
Insecure Output Handling refers specifically to insufficient validation, sanitization, and handling of the outputs generated by large language models before they are passed downstream to other components and systems. These downstream components include web browsers, backend services, databases, command shells, scripts, and even other AI systems. The risk arises because downstream entities often blindly accept LLM-generated output without proper scrutiny prior to usage. Since LLM-generated content can be controlled by prompt input, this behavior is similar to providing users indirect access to additional functionality.

Possibly a new condition for increasing impact...

  • 3rd party plugins do not adequately validate inputs.

please consider these modifications of how to prevent:

  1. Encode model output back to users to mitigate undesired code execution by JavaScript or Markdown. OWASP ASVS provides detailed guidance on output encoding.
  2. Treat the model as any other user, adopting a zero-trust approach, and apply proper input validation on responses coming from the model to backend functions. Follow the OWASP ASVS (Application Security Verification Standard) guidelines to ensure effective input validation and sanitization.

please consider these modifications to example attack scenarios

  1. An application utilizes an LLM plugin to generate responses for a chatbot feature. However, the application directly passes the LLM-generated response without proper output validation directly into an internal function responsible for executing system commands. This allows an attacker to manipulate the LLM output to execute arbitrary commands on the underlying system, leading to unauthorized access or unintended system modifications.
  2. A user utilizes a website summarizer tool powered by an LLM to generate a concise summary of an article. The website includes a prompt injection instructing the LLM to capture sensitive content from either the website or from the user's conversation. From there the LLM can encode the sensitive data and send it, without any output validation or filtering, to an attacker-controlled server.
  3. An application allows users to generate SQL queries through conversational interaction with an LLM. The LLM-generated queries are passed directly to the backend database without sufficient validation or sanitization. An attacker could exploit this by manipulating the LLM to output an unsafe query that deletes tables or extracts sensitive data, leading to unintentional data loss or exposure.
  4. A malicious user instructs the LLM to return a JavaScript payload back to a user, without sanitization controls. This can occur either through sharing a prompt, a prompt injected website, a side-channel attack, or a chatbot that accepts prompts from a URL parameter. The LLM would then return the unsanitized XSS payload.

Missing app layers

One of the key ways LLMs/ generative aI is used is through chaining/ agents. Agents put in one of the biggest risks for applications based on LLM/ Generative AI. There are issues with users allowing machine access to agents to perform tasks and agents seem to work autonomously creating security holes.

LLM03: Establish a clear definition of "model contamination" before the term gets away from us

OVERVIEW:
This is a proposal to add a clear definition of the term "model contamination" to v1.1 of LLM03 because it is loosely defined in the community right now and not having a clear definition of the term will muddy future discussion on the safety of LLMs

PROBLEM:
The term "contamination" has become a contentious topic within community discussions about LLM security. The ambiguity surrounding its definition has led to inconsistent actions, including the deranking of open-source models by multiple groups. The absence of a standardized definition for "contamination" has resulted in fragmented approaches to evaluating and addressing potential security threats in LLMs. Different groups are using varied criteria for what constitutes "contamination," leading to inconsistent evaluations and potentially unwarranted actions against specific models and datasets.

GOAL:
Establish a clear and universally accepted definition of "contamination" in the context of data training poisoning, promoting consistent evaluation criteria and ensuring the protection and integrity of LLMs. We need to distinguish "model contamination" from "training data poisoning", acknowledging the varying degrees of model insecurity can happen due to wilful or poor selection of training data. We need to define "model contamination" before the term gets away from use and should be context-driven and not merely based on subjective disagreements or preferences regarding training content.

REFERENCES:
Contaminated models have been removed from HuggingFace leaderboard
OpenPlatypus claims they've cleaned up contaminated HuggingFace datasets

Consider Removal of 2 examples of vulnerability

Consider removing these examples from Examples of vulnerability because as written they do not involved the LLM, they seem to be insecure plug in issues
4. A user enables a plugin linked to an e-commerce site. A rogue instruction embedded on a visited website exploits this plugin, leading to unauthorized purchases.
5. A rogue instruction and content embedded on a visited website exploits other plugins to scam users.

insecure output handling ambiguities

I feel like the description of insecure output handling causes it to easily be confused with insecure plug-in design. The beginning of insecure output handling says that it

arises when a downstream component blindly accepts large language model output without proper scrutiny such as passing LLM output directly to backend, privileged or client side functions.

Feels like it is easily confused with insecure plug-in design which also talks about input validation.

Change from Vulnerabilities to Risks

With the realization that the top 10 focuses on the risks that vulnerabilities, I recommend changing our template and the content of the Top 10 to match.

The TL;DR

Fo to the OWASP Top 10 page and the first 2 sentences read:

The OWASP Top 10 is a standard awareness document for developers and web application security. It represents a broad consensus about the most critical security risks to web applications.

My detailed reasoning is in this document: risks-vs-vulnerabilities.md

ACCEPTED - LLM03 clarify the example case

LLM03: Training Data Poisoning
Common Examples of Vulnerability

The model itself when situated within infrastructure has unrestricted access or inadequate sandboxing to gather datasets to be used as training data which has negative influence on outputs of generative AI prompts as well as loss of control from a management perspective.

Proposal:
Suppose the model itself is located within the infrastructure. In that case, unrestricted or inadequate access to the sandbox to collect data sets to be used as training data will adversely affect the output of the generative AI prompts and result in loss of control from a management perspective.

Training Data Poisoning Example is really Excessive agency?

I am unsure if this example is really Training Data Poisoning or if it represents Excessive Agency

  1. The model itself when situated within infrastructure has unrestricted access or inadequate sandboxing to gather datasets to be used as training data which has negative influence on outputs of generative AI prompts as well as loss of control from a management perspective.

Excessive Agency Overlaps with other Vulnerabilities

I think we should work on the description and the examples a little so that it is better distinguished from other vulnerabilities. I feel that the parts below are drifting away from the purpose of this vulnerability and are encroaching on other vulnerabilities.

Description
Excessive Agency may be confused with Insecure Output Handling since "Excessive Agency is the vulnerability that enables damaging actions to be performed in response to unexpected/ambiguous outputs from an LLM" but "Insecure Output Handling is a vulnerability that arises when a downstream component blindly accepts large language model (LLM) output without proper scrutiny". So, both can result in unintended actions due to uncontrolled LLM outputs. These vulnerabilities could clearly distinguish themselves from each other by adhering to the following: Excessive Agency relates to excessive capabilities granted to the LLM regardless of outputs whereas Insecure Output Handling arises from poor scrutiny of LLM outputs specifically.

Examples of Vulnerability
The second Example of vulnerability, “2. Excessive Functionality: An LLM plugin with open-ended functionality fails to properly filter the input instructions for commands outside what's necessary for the intended operation of the application. E.g., a plugin to run one specific shell command fails to properly prevent other shell commands from being executed.” Seems like it could be confused with Insecure Output Handling since the downstream component is “blindly accepting LLM output without proper scrutiny.” This example could also be classified as Insecure Plugin Design as well as Overreliance.

Example “5. Excessive Autonomy: An LLM-based application or plugin fails to independently verify and approve high-impact actions. E.g., a plugin that allows a user's documents to be deleted performs deletions without any confirmation from the user.” Seems to also align with Insecure Plugin Design, Overreliance.

Example Attack Scenarios
The example attack scenario reads as follows:
An LLM-based personal assistant app is granted access to an individual’s mailbox via a plugin in order to summarise the content of incoming emails. To achieve this functionality, the email plugin requires the ability to read messages, however the plugin that the system developer has chosen to use also contains functions for sending messages. The LLM is vulnerable to an indirect prompt injection attack, whereby a maliciously-crafted incoming email tricks the LLM into commanding the email plugin to call the 'send message' function to send spam from the user's mailbox. This could be avoided by: (a) eliminating excessive functionality by using a plugin that only offered mail-reading capabilities, (b) eliminating excessive permissions by authenticating to the user's email service via an OAuth session with a read-only scope, and/or (c) eliminating excessive autonomy by requiring the user to manually review and hit 'send' on every mail drafted by the LLM plugin. Alternatively, the damage caused could be reduced by implementing rate limiting on the mail-sending interface.

This example could also be slotted into Prompt Injection, as is stated in the example, but could also possibly be Overreliance. A simpler, more to the point example or even multiple focused examples may be more helpful in explaining this vulnerability and distinguishing it from others.

Insecure Plugin Design Spelling/Grammar

This allows a potential attacker to construct a malicious request to the plugin, which could result in a wide range of undesired behaviours, up to and including remote code execution.

(American spelling?)
This allows a potential attacker to construct a malicious request to the plugin, which could result in a wide range of undesired behaviors, up to and including remote code execution.

The harm of malicious inputs often depends on insufficient access controls and the failure to track authorisation across plugins. Inadequate access control allows a plugin to blindly trust other plugins and assume that the end user provided the inputs.

(American spelling?)
The harm of malicious inputs often depends on insufficient access controls and the failure to track authorization across plugins. Inadequate access control allows a plugin to blindly trust other plugins and assume that the end user provided the inputs.

When this is not possible, a second layer of typed calls should be introduced, parsing requests and applying validation and sanitisation.

(American spelling?)
When this is not possible, a second layer of typed calls should be introduced, parsing requests and applying validation and sanitization.

Plugin developers should apply OWASP’s recommendations in ASVS (Application Security Verification Standard) to ensure effective input validation and sanitisation.

(American spelling?)
Plugin developers should apply OWASP’s recommendations in ASVS (Application Security Verification Standard) to ensure effective input validation and sanitization.

Plugins should be designed to minimise the impact of any insecure input parameter exploitation following the OWASP ASVS Access Control Guidelines.

(American spelling?)
Plugins should be designed to minimize the impact of any insecure input parameter exploitation following the OWASP ASVS Access Control Guidelines.

Additionally, API Keys should be used to provide context for custom authorisation decisions which reflect the plugin route rather than the default interactive user.

(American spelling? and maybe change which reflect to that reflect)
Additionally, API Keys should be used to provide context for custom authorization decisions that reflect the plugin route rather than the default interactive user.

Require manual user authorisation and confirmation of any action taken by sensitive plugins.

(American spelling?)
Require manual user authorization and confirmation of any action taken by sensitive plugins.

Plugins are, typically, REST APIs, so developers should apply the recommendations found in OWASP Top 10 API Security Risks – 2023 to minimise generic vulnerabilities

(American spelling?)
Plugins are, typically, REST APIs, so developers should apply the recommendations found in OWASP Top 10 API Security Risks – 2023 to minimize generic vulnerabilities

Prompt Injection Vulnerability, AttackVector, or Both

The prompt injection section, in general is very clear. The primary question really is should we be separating the idea of an attack vector from a vulnerability. Several of the examples treat prompt injection as the way to exploit one of the other vulnerabilities. For example, is the vulnerability in the following Prompt Injection or is Prompt Injection the attack vector for exploiting the Insecure Plugin, Excessive Agency, or Overreliance.

  1. A user enables a plugin linked to an e-commerce site. A rogue instruction embedded on a visited website exploits this plugin, leading to unauthorized purchases.

The same question could be asked about most of the examples.

Another way to word my concern is stated in the latest OWASP Top 10

We spent several months grouping and categorizing CWEs and could have continued for additional months. We had to stop at some point. There are both root cause and symptom types of CWEs, where root cause types are like "Cryptographic Failure" and "Misconfiguration" contrasted to symptom types like "Sensitive Data Exposure" and "Denial of Service." We decided to focus on the root cause whenever possible as it's more logical for providing identification and remediation guidance. Focusing on the root cause over the symptom isn't a new concept; the Top Ten has been a mix of symptom and root cause. CWEs are also a mix of symptom and root cause; we are simply being more deliberate about it and calling it out.

Multiple spelling errors in vulns

Scope:

  • PromptInjection.md
  • ExcessiveAgency.md
  • insecurePluginDesign.md
  • modelDos.md
  • ModelTheft.md
  • Overreliance.md
  • SupplyChainVulnerabilities.md

ACCEPTED - LLM03: division of threats into those related to pre-training and those related to fine-tuning

Background
When working with LLMs, data can be poisoned at various stages of model development - when building the pre-training dataset, when performing fine-tuning, when developing the system prompt for the LLM application, etc.

Suggested change
In my opinion, the training data poisoning vulnerability description should somehow pay specific attention to the data used at different stages of model development.
Perhaps it should distinguish between data used in pre-training, embedding and fine-tuning? I.e. if embedding or fine-tuning uses the company's knowledge base, then it brings another vector for training data poisoning.

ACCEPTED - LLM03 :Contents Injection Attacks explanation

An example repository of this would be Autopoison testing, including both attacks such as Content Injection Attacks (“how to inject your brand into the LLM responses”) and Refusal Attacks (“always making the model refuse to respond”) that can be accomplished with this approach.

“how to inject your brand into the LLM responses” doesn't make sense.
It might be better "(attempting to promote a brand name in model responses)".

Model DoS Grammar/Spelling

An attacker interacts with a LLM in a method

(change a to an)
An attacker interacts with an LLM in a method

Sending queries that are unusually resource-consuming, perhaps because they use unusual orthography or sequences.

(somewhat shorter)
Sending unusually resource-consuming queries, perhaps because they use unusual orthography or sequences.

An attacker repeatedly sends multiple requests to a hosted model that are difficult and costly for it to process, leading

(wording improvement?)
An attacker repeatedly sends multiple difficult and costly requests to a hosted model leading

This flood of inputs puts excessive load on the LLM's resources,

(add an?)
This flood of inputs puts an excessive load on the LLM's resources,

Text to disambiguate prompt injection from other vulnerabilities

I think prompt injection really concentrates on subverting the large language model’s natural language processing, which then makes it a confused deputy, acting on behalf of the attacker. In some cases, the subversion happens through tricking the LLM into ignoring or modifying the system prompt. Prompt injection does not target training data, output handling, or other model or application components. I think adding something like this would help disambiguate prompt injection from other vulnerabilities. Vulnerability and attack examples, the way they are worded, actually add to the confusion with other vulnerabilities.

Example Attack Scenarios miscategorized?

Sensitive information disclosure, the following two attacks scenarios are provided

I feel like the following one is really prompt injection.

User A targets a well crafted set of prompts to bypass input filters and sanitization from the LLM to cause it to reveal sensitive information (PII) about other users of the application.

I feel this one is better categorized as training data poisoning

Personal data such as PII is leaked into the model via training data due to either negligence from the user themselves, or the LLM application. This case could increase risk and probability of scenario 1 or 2 above.

Clean up of Excessive Agency

My goal is to better delineate the vulnerabilities from each other and make examples clear and focused on documenting the vulnerability and delineating it from others.

Description

The following description of Excessive Agency makes it appear more like Insecure Output Handling since "Insecure Output Handling is a vulnerability that arises when a downstream component blindly accepts large language model (LLM) output without proper scrutiny":

Excessive Agency is the vulnerability that enables damaging actions to be performed in response to unexpected/ambiguous outputs from an LLM.

These vulnerabilities could clearly distinguish themselves from each other by adhering to the following: Excessive Agency relates to excessive capabilities granted to the LLM regardless of outputs whereas Insecure Output Handling arises from poor scrutiny of LLM outputs specifically.

Examples of Vulnerability

One Example of Vulnerability is the following which seems like it could be confused with Insecure Output Handling since the downstream component is "blindly accepting LLM output without proper scrutiny." This example might be better classified as Insecure Plugin Design. If it were reworded to be a configuration issue, then Excessive Agency may be the better choice.

Excessive Functionality: An LLM plugin with open-ended functionality fails to properly filter the input instructions for commands outside what's necessary for the intended operation of the application. E.g., a plugin to run one specific shell command fails to properly prevent other shell commands from being executed.
Example 5 seems to also align with Insecure Plugin Design.

Excessive Autonomy: An LLM-based application or plugin fails to independently verify and approve high-impact actions. E.g., a plugin that allows a user's documents to be deleted performs deletions without any confirmation from the user.
Example Attack Scenarios

The example attack scenario reads as follows:

An LLM-based personal assistant app is granted access to an individual's mailbox via a plugin in order to summarise the content of incoming emails. To achieve this functionality, the email plugin requires the ability to read messages, however the plugin that the system developer has chosen to use also contains functions for sending messages. The LLM is vulnerable to an indirect prompt injection attack, whereby a maliciously-crafted incoming email tricks the LLM into commanding the email plugin to call the 'send message' function to send spam from the user's mailbox. This could be avoided by: (a) eliminating excessive functionality by using a plugin that only offered mail-reading capabilities, (b) eliminating excessive permissions by authenticating to the user's email service via an OAuth session with a read-only scope, and/or (c) eliminating excessive autonomy by requiring the user to manually review and hit 'send' on every mail drafted by the LLM plugin. Alternatively, the damage caused could be reduced by implementing rate limiting on the mail-sending interface.

This example could also be slotted into Prompt Injection, as is stated in the example. A simpler, more to the point example or even multiple focused examples may be more helpful in explaining this vulnerability and distinguishing it from others.

Recommended Updates to Excessive Agency

Recommend
trialed instead of trialled

favor instead of favour

etc. instead of etc

Add 'to' to this phrase: Limit the permissions that LLM plugins/tools are granted to other systems the minimum necessary
So it reads: Limit the permissions that LLM plugins/tools are granted to other systems to the minimum necessary

Consider reworking the following:
Excessive Agency is the vulnerability that enables damaging actions to be performed in response to unexpected/ambiguous outputs from an LLM (regardless of what is causing the LLM to malfunction; be it hallucination/confabulation, direct/indirect prompt injection, malicious plugin, poorly-engineered benign prompts, or just a poorly-performing model). The root cause of Excessive Agency is typically one or more of: excessive functionality, excessive permissions or excessive autonomy.

To be:
Excessive Agency is the vulnerability that enables damaging actions to be performed in response to unexpected/ambiguous outputs from an LLM (regardless of what is causing the LLM to malfunction; be it hallucination/confabulation, direct/indirect prompt injection, malicious plugin, poorly-engineered benign prompts, or just a poorly-performing model). Excessive agency is a vulnerability of excessive functionality, permissions, and/ or autonomy. This differs from Insecure Output Handling which is concerned with insufficient scrutiny of LLM outputs.

Consider these adjustments to Common Examples of Vulnerability

  1. Excessive Functionality: An LLM agent has access to plugins which include functions that are not needed for the intended operation of the system. For example, a developer needs to grant an LLM agent the ability to read documents from a repository, but the 3rd-party plugin they choose to use also includes the ability to modify and delete documents.
  2. Excessive Functionality: A plugin that was trialed during the development phase was dropped in favor of a better alternative, but the original plugin remains available to the LLM agent.
  3. Excessive Permissions: An LLM plugin has permissions on other systems that are not needed for the intended operation of the application. E.g., a plugin intended to read data connects to a database server using an identity that not only has SELECT permissions, but also UPDATE, INSERT and DELETE permissions.
  4. Excessive Permissions: An LLM plugin that is designed to perform operations on behalf of a user accesses downstream systems with a generic high-privileged identity. E.g., a plugin to read the current user's document store connects to the document repository with a privileged account that has access to all users' files.
  5. Excessive Autonomy: An LLM-based application or plugin fails to independently verify and approve high-impact actions. E.g., a plugin that allows a user's documents to be deleted performs deletions without any confirmation from the user.

Sensitive Information Disclosure

In summary, this “vulnerability” is problematic because it mostly doesn’t represent a root cause, but a result or symptom. In the 2021 OWASP Top 10, they reoriented from symptoms to root causes. They actually renamed sensitive information exposure to cryptographic failure to focus on root causes.

There is, however, one aspect of this vulnerability that seems to be a true root cause. Regardless of the input it gets, large language model outputs are unpredictable. That unpredictability doesn’t necessarily come from training data poisoning, third-party data sources, or prompt injection. It comes from the stochastic nature and natural language processing capabilities of the large language model as well as ambiguities and inaccuracies inherent in natural language. I would argue that this aspect is a key point mentioned in this section that should be carried forward, however, I believe it makes more sense in Overreliance.

Coverage of Alignment Issues

Given Overreliance is defined as:

Overreliance occurs when systems or people depend on LLMs for decision-making or content generation without sufficient oversight

and covers confabulation:

generate content that is factually incorrect, inappropriate or unsafe. This is referred to as hallucination or confabulation and can result in misinformation, miscommunication, legal issues, and reputational damage

I think it is a small step to also cover alignment issues. Specifically I am talking about the business or person employing an LLM for the user of their end users.

Project title should be :Top 10 for Generative AI" versus just LLM

We do not want a separate project for Diffusion (for images), just like we have LLM for text. In my view the Project could be named "OWASP Top 10 for Generative AI". My view is the Top 10 mentioned apply to all other modalities like video/ image/ powerpoint etc too.

ACCEPTED - LLM03 Training Data Poisoning - Add Additional References

Setting this if anything as a reminder for myself to include this resource ("Poisoning Web-Scale Training Datasets is Practical") within v1.1 release under LLM03.
Introduce new vulnerability, vulnerabilities as well as mitigation methods and strategies:

  1. Split-View Data Poisoning
  2. Frontrunning Poisoning

Additional references:

Remove LLM05: Supply Chain Vulnerabilities in Favor of LLM03: Training Data Poisoning and LLM07: Insecure Plugin Design

Background:

  • As per published v1.0 of the OWASP Top 10 for Large Language Model Applications project, I believe we have a duplicate entry within LLM05: Supply Chain Vulnerabilities

    • The LLM-05 LLM05: Supply Chain Vulnerabilities entry largely overlaps with LLM-03 LLM03: Training Data Poisoning
    • The LLM-05 LLM05: Supply Chain Vulnerabilities entry also overlaps with the topic of plugins in LLM07: Insecure Plugin Design LLM07: Insecure Plugin Design
  • This is causing confusion, inconsistency and leaves us not fully utilizing the power of the Top 10 list to include any new vulnerabilities surfaced or if we decide to look at some of the vulnerables < v1.0 which didn't make the cut when we revise v.1.0.1 of the project

  • If we also take the foundation of the OWASP Top 10 lists - OWASP Top 10 API Security Risks – 2023 as an example, then it is clear (or at least ot a security professional) that even though Supply Chain is not listed as a vulnerability but an organization referencing and basing their Web Application Security stack on the OWASP Top 10, certainly still need to take the whole concept of Supply Chain (via SBOM, however) is still extremely important to that software stack.

  • When we threat model an LLM Application, we have to ensure that are some pre-requisites taken into consideration which make them out of scope for us to focus specifically on risks and vulnerabilities present in the area we are assessing. These may be explicitly stated or inherently understood and is not listed as a top risk during the exercise.

Proposed Changes

  • Whilst I agree Supply Chain is an important topic in any security considerations, only the important true top 10 vulnerabilities require each subset referencing the supply-chain if it is valid in that specific area - I.E, Training Data Poisoning, Plugins etc.
    • I do not believe we need an entry soley dedicated to Supply Chain itself and each entry that has a Supply Chain as an example attack scenario should reference the supply chain topic here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.