LLM APIs Vulnerabilities

Discover Pynt's documentation on security tests for LLM (Large Language Model) injections! Learn how Pynt fortifies your APIs against LLM-based vulnerabilities.

At a Glance: LLM vulnerabilities, such as Prompt Injection and Insecure Output Handling, occur when untrusted data is sent to or from a language model without proper validation or sanitization. Attackers can exploit these vulnerabilities to manipulate the behavior of the LLM, extract sensitive data, or execute malicious commands in downstream systems.


What are the common mistakes made by developers?

LLM-based attacks happen when input from an API request is used directly in an LLM prompt or when outputs from an LLM are used directly in operations such as rendering HTML content, executing system calls, or interfacing with other services, without validating whether the input or output includes unwanted content.

How can I fix LLM Injection issues?

Prompt Injection

To prevent prompt injection attacks, consider the following measures:

  1. Input Sanitization: Always sanitize and validate user input before incorporating it into LLM prompts. Remove or escape any tokens or phrases that could alter the intended behavior of the LLM.

  2. Use Contextual Prompts: Structure prompts to minimize the impact of injected content. For example, clearly separate system instructions from user-provided content.

  3. Limit LLM Instructions: Avoid including system-level instructions in prompts that can be manipulated by user input.

Insecure Output Handling

Insecure Output Handling refers to the lack of adequate validation, sanitization, and management of outputs generated by large language models before they are passed downstream to other components and systems.

Exploiting an Insecure Output Handling vulnerability can lead to XSS, CSRF, and Markdown injection attacks in web browsers, as well as SSRF, privilege escalation, or remote code execution on backend systems.

Example of Insecure Output Handling:

Consider an API that uses an LLM to generate HTML content based on user input:

from flask import Flask, request, render_template_string
from some_llm_library import generate_text

app = Flask(__name__)

@app.route('/generate', methods=['POST'])
def generate():
    topic = request.form.get('topic')
    prompt = f"Write an article about {topic}"
    content = generate_text(prompt)
    html = f"<html><body>{content}</body></html>"
    return render_template_string(html)

if __name__ == '__main__':
    app.run()

If the LLM generates malicious JavaScript code within content, it could lead to an XSS attack when rendered in the user's browser.

How to Fix Insecure Output Handling:

  • Sanitize LLM Outputs: Use libraries to escape or remove potentially harmful content before rendering.

  • Set Content Security Policies: Implement CSP headers to restrict the execution of scripts and other potentially dangerous content.

Fixed Code Example:

from flask import Flask, request, render_template_string
from markupsafe import escape
from some_llm_library import generate_text

app = Flask(__name__)

@app.route('/generate', methods=['POST'])
def generate():
    topic = request.form.get('topic')
    prompt = f"Write an article about {escape(topic)}"
    content = generate_text(prompt)
    safe_content = escape(content)
    html = f"<html><body>{safe_content}</body></html>"
    return render_template_string(html)

if __name__ == '__main__':
    app.run()

Markdown Injection

Markdown content generated by LLMs can include embedded HTML or JavaScript, leading to XSS attacks when rendered.

Vulnerable Example:

from flask import Flask, request, Markup
import markdown
from some_llm_library import generate_text

app = Flask(__name__)

@app.route('/post', methods=['POST'])
def post():
    topic = request.form.get('topic')
    prompt = f"Write a detailed post about {topic}"
    markdown_content = generate_text(prompt)
    html_content = markdown.markdown(markdown_content)
    return f"<html><body>{html_content}</body></html>"

if __name__ == '__main__':
    app.run()

An attacker could manipulate the LLM to generate malicious Markdown that includes scripts.

Fixing Markdown Injection:

  • Use Safe Markdown Renderers: Utilize Markdown libraries that sanitize HTML content.

  • Sanitize After Rendering: Escape or remove any HTML tags after converting from Markdown.

Fixed Code Example:

from flask import Flask, request
import markdown
from bleach import clean
from some_llm_library import generate_text

app = Flask(__name__)

@app.route('/post', methods=['POST'])
def post():
    topic = request.form.get('topic')
    prompt = f"Write a detailed post about {topic}"
    markdown_content = generate_text(prompt)
    html_content = markdown.markdown(markdown_content)
    safe_html = clean(html_content)
    return f"<html><body>{safe_html}</body></html>"

if __name__ == '__main__':
    app.run()

Server-Side Request Forgery (SSRF) via LLM

An LLM might generate URLs or network requests based on user input, potentially leading to SSRF attacks.

Vulnerable Example:

import requests
from flask import Flask, request, jsonify
from some_llm_library import generate_text

app = Flask(__name__)

@app.route('/fetch', methods=['POST'])
def fetch():
    query = request.json.get('query')
    prompt = f"Provide the URL for {query}"
    url = generate_text(prompt)
    response = requests.get(url)
    return jsonify({'data': response.text})

if __name__ == '__main__':
    app.run()

An attacker could manipulate the LLM to generate internal URLs, causing the server to make requests to internal services.

Preventing SSRF:

  • Validate and Sanitize URLs: Ensure that the generated URLs point to allowed domains.

  • Implement Network Policies: Restrict the server's network access to prevent unauthorized requests.

Fixed Code Example:

import requests
from flask import Flask, request, jsonify
from urllib.parse import urlparse
from some_llm_library import generate_text

app = Flask(__name__)

ALLOWED_DOMAINS = ['example.com']

def is_allowed(url):
    domain = urlparse(url).netloc
    return domain in ALLOWED_DOMAINS

@app.route('/fetch', methods=['POST'])
def fetch():
    query = request.json.get('query')
    prompt = f"Provide the URL for {query} on example.com"
    url = generate_text(prompt).strip()
    if not is_allowed(url):
        return jsonify({'error': 'Disallowed domain'}), 400
    response = requests.get(url)
    return jsonify({'data': response.text})

if __name__ == '__main__':
    app.run()

By implementing these measures, you can significantly reduce the risk of LLM-related vulnerabilities in your APIs. Always treat both the input to and output from LLMs with the same caution as you would with any untrusted data.

  • Test cases in this category

These test cases detect LLM APIs vulnerabilities:

Test caseOWASPCWE

[LLM001] Direct prompt injection

[LLM002] Prompt injection, alignment

[LLM003] Insecure output handling, type: XSS

[LLM004] Insecure output handling, type: SSRF

[LLM005] Insecure output handling, type: Markdown

Last updated