May 27, 2026 · security · 38 min read · 9200 words

AI Threat Modelling — How to Assess the Attack Surface Traditional Frameworks Miss.

security ai threat-modelling mitre-atlas owasp adversarial-ml

AI systems are no longer in the pilot phase. Language models handle customer support queues. Fraud detection engines make real-time authorization decisions affecting millions of transactions daily. Recommendation systems personalize content for hundreds of millions of users. Autonomous agents browse the web, execute code, and write to databases — with limited human oversight.

Behind every one of these deployments is an attack surface that most security teams have never been trained to assess. The 2025 AI Threat Landscape Report found that 61% of organizations deploying AI still lack a dedicated security strategy for it.

Traditional threat modelling provides a strong foundation. STRIDE has helped defenders systematically identify security threats for over two decades. But AI systems introduce assets, behaviours, and failure modes that those frameworks weren't designed to handle. The attack surface isn't just bigger — it's structurally different.

This article covers the complete methodology: inventory AI-specific assets, adapt STRIDE for the AI context, enrich findings with MITRE ATLAS technique IDs, and map OWASP LLM Top 10 risks directly to architectural components.

Why Your Existing Threat Models Are Incomplete

If you've modelled traditional web applications, you're used to a familiar inventory: databases, API keys, configuration files, user credentials, source code. You know where they live and how attackers reach them.

AI systems expand that inventory in ways most organizations haven't accounted for. Three structural differences drive this.

Non-determinism changes how you reason about failure. Traditional applications are deterministic — same input, same output. AI models, especially LLMs, produce different outputs for the same input across runs. Testing, auditing, and incident reproduction become fundamentally harder. There's no stack trace for a hallucination.

The black-box problem limits inspection. Most ML models lack the explainability of traditional application logic. You can't step through a neural network the way you'd trace a code path. Threat modelling must shift from code-level inspection to thinking in terms of input-output behaviour, access boundaries, and observed failure modes.

The data supply chain introduces delayed, invisible compromise. A compromised npm package can be detected and reverted within hours. A poisoned training dataset may not surface its effects for weeks — only after the model is retrained, validated, and deployed. The window between compromise and detection is orders of magnitude longer.

These aren't theoretical concerns. MITRE ATLAS documents 52 real-world case studies of AI attacks. The ShadowRay incident (AML.CS0023) demonstrated that attackers actively target AI training infrastructure using vulnerabilities in production ML frameworks. The Morris II worm (AML.CS0024) showed that prompt injection can propagate autonomously between AI agents through RAG-based pipelines — without any user interaction.

Step 1 — Inventory Your AI Assets

Before you can model threats, you need to know what you're protecting. AI systems introduce asset categories that don't appear in traditional application inventories.

Training Data

The datasets used to teach the model its behaviour. This isn't just "data" in the traditional sense — it's the substrate from which the model's intelligence emerges. Poisoning training data doesn't trigger an alert. It quietly teaches the model to make incorrect decisions, and those decisions are baked into the weights until the model is retrained from scratch.

The attack bar is lower than most assume. Poisoning as little as 0.04% of a training corpus can yield a 98.2% attack success rate. In RAG systems, injecting just five malicious documents into a knowledge base containing millions of records can achieve a 90% attack success rate against targeted queries.

Model Weights and Parameters

The numerical values that define what the model has learned. These are the model. Stealing model weights is qualitatively different from stealing a database — you can't rotate a credential and move on. The stolen asset is a functional replica of months of compute investment and potentially millions in training cost.

Model extraction attacks (AML.T0024) don't require direct access to weights. An attacker systematically queries a public-facing API and uses the input-output pairs to reconstruct a functionally equivalent copy. Adversarially trained models are documented to be more vulnerable to extraction than naturally trained ones, achieving higher clone accuracy with fewer queries.

Embedding Vectors

Numerical representations of text or data used for similarity computation, retrieval, and as inputs to downstream models. Used extensively in RAG pipelines, recommendation engines, and fraud detection systems.

The ALGEN framework (February 2025) demonstrated that as few as 1,000 alignment samples are sufficient to mount a partially successful inversion attack on black-box encoders. Sharing embeddings with third-party services is, in practical terms, sharing approximations of the original source documents. This isn't widely understood by teams deploying RAG at scale.

System Prompts

The instructions that define the model's behaviour, constraints, tone, and capabilities. Leaking system prompts hands attackers a roadmap: they can see exactly what the model is restricted from doing and what guardrails are in place, then craft inputs specifically designed to circumvent them. Credentials or API keys embedded in system prompts rather than secure vaults are a common misconfiguration that prompt extraction attacks directly exploit.

Feature Stores

Preprocessed data repositories that feed real-time model inputs — particularly relevant in fraud detection and recommendation systems. Tampering with feature values changes what the model "sees" at inference time without touching the model itself. The manipulation surface is separate from both training data and model weights, which means it's often overlooked in threat assessments focused on the model.

Model Registry and Artifacts

Stored and versioned trained models ready for deployment. A compromised registry is a particularly dangerous attack path: an attacker can swap a validated model for a backdoored one (AML.T0018), and the backdoored model passes standard validation checks because the trigger patterns are absent from the validation dataset. Everything looks clean until those triggers appear in production traffic.

Step 2 — Map the Data Supply Chain

Each stage in the journey from raw data to production prediction is a potential compromise point. The critical difference from traditional software supply chains: effects of early-stage compromise only surface at late stages — sometimes months later.

Stage 1: Data Collection

Training data is gathered from web scraping, purchased datasets, internal databases, user-generated content, and third-party providers. An attacker with influence over any of these sources has a foothold. Insider threats with access to annotation tooling, malicious federated clients, and supply chain poisoning through third-party datasets are all documented attack vectors.

Stage 2: Cleaning and Labelling

Raw data is preprocessed, filtered, and labelled. In some pipelines this involves external annotation teams; in others, labels are derived implicitly from outcomes (chargebacks, investigation results). Mislabelled data doesn't look corrupted — it teaches the model the wrong associations silently. A fraud detection model trained on mislabelled transaction data will produce systematically incorrect decisions with no visible anomaly at the data level.

Stage 3: Model Training

The model learns patterns over days or weeks of compute. Any poison that survived the first two stages is now embedded in the weights. Unlike a compromised library you can patch, a poisoned model may require full retraining at significant time and cost.

Stage 4: Validation and Packaging

The trained model is evaluated, versioned, and stored in a model registry. Standard validation passes because backdoor triggers are specifically crafted to be absent from validation sets. This is where model registry integrity controls become critical — registry signing, artifact provenance tracking, access controls on who can push to the registry.

Stage 5: Inference

The model serves predictions in production. For LLM-based systems, this stage typically includes a retrieval pipeline pulling additional context from vector databases or document stores at query time. This introduces an injection surface with no equivalent in traditional applications: attackers who can influence retrieved content can influence model behaviour without touching the model itself.

Concrete example: a fraud detection system retrains monthly on new transaction data. An attacker injecting crafted transactions into that pipeline over several months gradually shifts the model's decision boundaries — making specific fraud patterns progressively invisible. The degradation is slow enough that standard performance monitoring may not flag it until significant damage is done.

Step 3 — Apply STRIDE with AI-Specific Context

STRIDE remains the best starting point for structured threat identification. The six categories are still valid lenses — they just manifest differently when the system under review includes AI components.

S — Spoofing → Data Source Impersonation

Traditional: An attacker forges credentials to impersonate a legitimate user or service.

AI manifestation: In RAG architectures, models retrieve context from external sources and treat that context as authoritative. An attacker who injects content into a knowledge base, vector database, or document store effectively spoofs the model's knowledge. The model then serves attacker-controlled information with the confidence of a trusted internal source.

Other AI-specific spoofing vectors: deploying a look-alike API endpoint that mimics a legitimate AI service; adversarial inputs designed to fool AI-based identity verification systems (facial recognition, voice authentication).

T — Tampering → Data Poisoning

Traditional: Modifying data in transit or at rest — database records, API responses, configuration files.

AI manifestation: Injecting malicious data into the training pipeline causes the model to learn incorrect patterns. Effects are delayed and diffuse — embedded during training, only surfacing during inference. Poisoning can be targeted (forcing specific misclassifications) or untargeted (degrading overall performance).

ATLAS AML.T0020 (Data Poisoning) and AML.T0018 (Backdoor ML Model) are the references here. Prompt injection (AML.T0051) also maps to Tampering: manipulating instructions or context the model processes at inference time — either directly through user input or indirectly through retrieved content.

R — Repudiation → Unexplainable Model Decisions

Traditional: A user denies performing an action because the system lacks adequate audit trails.

AI manifestation: When an AI model makes a consequential decision — approving a loan, flagging a transaction, denying a claim — can you reconstruct why? Most ML models lack built-in explainability. Without robust logging of inputs, outputs, model versions, retrieval context, and inference parameters, reproducing a specific decision after the fact may be impossible.

This has direct regulatory implications. GDPR Article 22 gives individuals the right to explanations for automated decisions. Deploying AI in regulated contexts without decision audit trails creates both security and compliance exposure simultaneously.

I — Information Disclosure → Model Extraction

Traditional: Sensitive data exposed through breaches, insecure APIs, or verbose error responses.

AI manifestation: Model extraction (AML.T0024) — systematically querying a model's API to reconstruct a functional copy — requires no access to internals. Only the public-facing endpoint is needed. The stolen asset represents the organization's full AI capability: months of training compute, proprietary architecture decisions, and the competitive advantage encoded in the weights.

Other AI-specific disclosure vectors: training data extraction (queries that cause the model to regurgitate memorized training content, including PII); system prompt leakage (prompt extraction revealing internal constraints and credentials); embedding inversion (reversing vectors to reconstruct source documents). ATLAS AML.T0025 covers membership inference — determining whether specific data was used in training, which can reveal sensitive organizational information.

D — Denial of Service → Inference Cost Exploitation

Traditional: Flooding a system with traffic to exhaust resources and degrade availability.

AI manifestation: AI inference is orders of magnitude more expensive than traditional API calls. In cloud-deployed models billed per token, an attacker can inflict financial damage without taking the system offline. A chatbot endpoint flooded with crafted long-context prompts designed to trigger maximum-length responses never goes down — availability metrics stay green while the monthly inference bill multiplies 10–12x. This is often called "Denial of Wallet."

Beyond financial attacks: sponge examples (adversarial inputs crafted to maximize compute per inference call), GPU resource exhaustion, and training pipeline disruption via high-volume junk data injection all fall under this category.

E — Elevation of Privilege → Jailbreaking and Excessive Agency

Traditional: Gaining access or capabilities beyond what's permitted — unprivileged user getting admin rights.

AI manifestation: An attacker crafts inputs that cause a model to ignore its safety guidelines, content policies, or behavioural restrictions. The model was designed to refuse specific requests — the attacker's input "elevates" their access to the model's full unrestricted capabilities. Conceptually equivalent to privilege escalation: the attacker doesn't get root on a server, but they gain access to capabilities the system was explicitly configured to deny.

For agentic systems, the scope of "privilege" has expanded dramatically. AI agents in 2026 can open pull requests, query databases, send emails, execute code, and trigger automated workflows. A jailbroken agent isn't just a chatbot with fewer content restrictions — it's an entity with access to whatever integrations and tools were provisioned for it. Compromised agents in production have been documented executing unauthorized commands, exfiltrating data, and moving laterally across connected systems.

MITRE ATLAS added 14 new techniques in late 2025 specifically targeting agentic AI, covering prompt injection, memory manipulation, and tool misuse vectors that don't exist in non-agentic deployments.

STRIDE-AI Consolidated Mapping

STRIDE AI Manifestation Key ATLAS Technique
Spoofing Data source impersonation via RAG injection
Tampering Data poisoning, prompt injection AML.T0020, AML.T0051
Repudiation Lack of model decision audit trails
Info Disclosure Model extraction, training data leakage AML.T0024, AML.T0025
Denial of Service Inference cost exploitation (Denial of Wallet)
Elevation of Privilege Jailbreaking, excessive agency, tool abuse AML.T0015, AML.T0018

Step 4 — Enrich with MITRE ATLAS

STRIDE gives you threat categories. ATLAS gives you the specific techniques attackers use and the mitigations that stop them. Think of ATLAS as MITRE ATT&CK's AI-focused counterpart: the same tactic-technique-mitigation hierarchy, applied to ML systems.

ATLAS currently contains 16 tactics, 155 techniques, 35 mitigations, and 52 real-world case studies. MITRE maintains it with contributions from industry, academia, and government. Always check atlas.mitre.org for current counts — a late 2025 collaboration with Zenity Labs added 14 new agent-focused techniques alone.

Using ATLAS in practice: for each threat identified through STRIDE, look up the corresponding technique. The technique page provides the attack method, required attacker access, real-world case studies, and recommended mitigations. This moves your assessment from "tampering risk exists" to a specific, actionable finding with a documented technique ID, a defensive playbook, and references to real incidents.

Five ATLAS techniques every defender should know:

Data Poisoning (AML.T0020): Injecting malicious data into training pipelines. Effects are delayed and persist until full retraining. Mitigations: data provenance tracking, anomaly detection on training inputs, model performance drift monitoring.

Model Extraction (AML.T0024): Systematically querying a public API to reconstruct a functional model copy. Mitigations: query rate limiting, output confidence score suppression, monitoring for systematic query patterns.

LLM Prompt Injection (AML.T0051): Direct injection via user input; indirect injection via content the model retrieves or processes. For RAG systems, indirect injection is the primary vector — an attacker who can write to a knowledge base injects instructions that execute when the model retrieves that content. Mitigations: input/output validation, instruction hierarchy enforcement, privilege separation between user and system channels.

Backdoor ML Model (AML.T0018): Hidden triggers embedded during training cause specific malicious behaviour when activated. The model passes all standard validation — the backdoor is invisible until the trigger pattern appears in production traffic. Mitigations: training data provenance controls, model scanning tools, anomaly detection on inference outputs.

Evade ML Model (AML.T0015): Crafting inputs that cause systematic misclassification — adversarial examples in the classical sense. Used to evade malware detection, bypass content filters, and cause misclassification in downstream pipelines. This threat spans multiple STRIDE categories (Tampering, Spoofing, Elevation of Privilege) simultaneously, which is a key reason STRIDE alone falls short.

Step 5 — Map OWASP LLM Top 10 to Architecture

OWASP gives you the component-level view. The 2025 edition reflects the reality that RAG architectures went mainstream, agentic AI moved into production, and inference costs became large enough to serve as an attack vector on their own.

The critical skill this framework builds is bidirectional risk mapping:

  • Risk → Component: "Where does prompt injection live?" Trace it to the inference endpoint and the RAG pipeline.
  • Component → Risk: "We're deploying a vector database — what risks does it inherit?" Find every OWASP entry where vector database appears.
# Risk Vulnerable Components
LLM01 Prompt Injection LLM inference endpoint, vector DB / RAG pipeline, any component feeding text to the model
LLM02 Sensitive Info Disclosure Inference endpoint (model memorization), training pipeline, system prompt config
LLM03 Supply Chain Training pipeline (third-party datasets, base models), model registry, plugin integrations
LLM04 Data and Model Poisoning Training pipeline, model registry, feature store
LLM05 Improper Output Handling Web frontend (XSS risk), API gateway, any system consuming model responses
LLM06 Excessive Agency LLM inference endpoint, tool integrations (DB, code exec, email), agentic orchestration layer
LLM07 System Prompt Leakage LLM inference endpoint, system prompt configuration
LLM08 Vector and Embedding Weaknesses Vector database, RAG pipeline, embedding generation process
LLM09 Misinformation LLM inference endpoint (hallucination), vector DB (stale sources), user-facing output channels
LLM10 Unbounded Consumption LLM inference endpoint, API gateway, training pipeline

LLM Inference Endpoint appears in seven of the ten entries (LLM01, LLM02, LLM05, LLM06, LLM07, LLM09, LLM10). This is the component requiring the most comprehensive hardening: input validation, output sanitization, rate limiting, guardrail enforcement, system prompt security, tool permission scoping.

Vector Database / RAG Pipeline appears in three entries (LLM01, LLM08, LLM09). Security focus: access controls on the vector store, input validation for indexed content, freshness monitoring for source documents, and treating the retrieval layer as a trust boundary — not a trusted internal component. A 2025 study found that poisoning just five documents in a corpus of millions achieves 90% attack success against targeted queries.

Training Pipeline is the primary entry point for LLM03 and LLM04. This is where third-party models, datasets, and fine-tuning data enter the system. Mitigations are largely supply chain controls: data provenance, dataset scanning, base model verification, registry signing, separation of training and production environments.

The Three-Layer Methodology

STRIDE, ATLAS, and OWASP aren't competing frameworks. They're layers of the same assessment, operating at different zoom levels.

STRIDE-AI provides the wide-angle view: structured threat categories applied to each AI component. Walk each component through the six categories. Output: a list of threat types with initial context.

MITRE ATLAS provides the technical zoom: for each identified threat, look up the corresponding technique. Get the specific attack method, required access, real-world examples, and defensive mitigations. Output: a finding with a technique ID, documented case studies, and an actionable defensive playbook.

OWASP LLM Top 10 provides the component map: where does each risk actually live in your architecture, and how critical is it? Use bidirectional mapping — risk to component for scoping, component to risk for new components being introduced.

Run them in sequence. STRIDE first, ATLAS enrichment second, OWASP component mapping third. The combination gives you what none of the three provides alone: threat categories (STRIDE), technical specificity (ATLAS), and architectural location with prioritization (OWASP).

The Agentic AI Frontier

The threat landscape above covers 2026 AI deployments as they exist today. But the fastest-growing segment — agentic AI — introduces a new tier of risk.

By 2026, more than 80% of enterprises have deployed some form of autonomous AI agent in production. These agents don't just answer questions — they execute multi-step workflows, call external APIs, write and run code, interact with internal databases, and coordinate with other agents. The meaning of "excessive agency" has expanded from "chatbot with unrestricted content" to "autonomous system with access to production infrastructure."

Prompt injection is to agentic AI what SQL injection was to early web applications. A fundamental flaw from mixing untrusted data with trusted instructions.

In 2025, documented prompt injection attempts against enterprise AI systems increased 340% year-over-year. Indirect attacks — malicious instructions embedded in content the agent retrieves rather than in direct user input — now account for over 55% of observed incidents and achieve 20–30% higher success rates than direct attacks.

Memory poisoning is an agent-specific vector with no traditional equivalent. Agents maintain persistent memory across sessions. Injecting adversarial content into that memory influences all future behaviour — the compromise persists and compounds rather than ending when the session terminates.

Cross-agent escalation allows attackers to compromise one agent's input and affect other agents in the same pipeline. In multi-agent architectures where agents hand off context, a single injection point can propagate through the entire system.

The OWASP Top 10 for Agentic AI Applications — a companion framework released in late 2025 — covers these risks specifically: uncontrolled autonomy, delegated identity abuse, cross-agent prompt injection, and memory manipulation. For organizations deploying autonomous systems, this is a necessary complement to the LLM Top 10.

Practical Threat Assessment Checklist

When conducting a threat assessment on an AI deployment, use this as a starting structure.

Asset Inventory (before you model threats):

  • Training data sources identified and provenance documented
  • Model weights and artifacts in a version-controlled registry with access controls
  • Embedding generation pipeline and vector database access controls reviewed
  • System prompts stored in secure vaults (not hardcoded in application code)
  • Feature stores identified and input validation controls documented
  • Third-party model dependencies and base models verified for integrity

Per-Component STRIDE Pass:

  • Each component walked through all six STRIDE categories
  • AI-specific manifestations applied (data poisoning under T, model extraction under I, etc.)
  • ATLAS technique IDs attached to each identified threat
  • OWASP LLM Top 10 entry mapped for each component

Supply Chain Controls:

  • Training data integrity monitoring in place
  • Model registry signing and artifact verification configured
  • Third-party dataset scanning before ingestion
  • Separation between training and production environments enforced

Inference-Time Controls:

  • Input validation at inference endpoint
  • Output sanitization before downstream consumption (LLM05)
  • Rate limiting and cost monitoring to detect Denial of Wallet (LLM10)
  • Tool permissions scoped to minimum required access (LLM06)
  • System prompt stored separately from user input channels (LLM07)

RAG-Specific Controls (if applicable):

  • Knowledge base access controls — who can write to the vector store?
  • Input validation for indexed content
  • Source freshness monitoring
  • Retrieval layer treated as untrusted input, not trusted internal context

Agentic-Specific Controls (if applicable):

  • Agent permissions reviewed against OWASP Agentic AI Top 10
  • Memory persistence boundaries defined and monitored
  • Cross-agent trust boundaries enforced
  • Human-in-the-loop checkpoints for high-impact actions

Monitoring and Response:

  • Decision logging for consequential model outputs (addresses Repudiation)
  • Model performance drift monitoring to detect delayed poisoning effects
  • Anomaly detection on query volume and patterns to detect extraction attempts
  • Incident response playbooks updated for AI-specific compromise scenarios

Key Takeaways

AI systems aren't traditional applications with a model bolted on. They have different assets, a separate data supply chain, failure modes that are delayed and non-deterministic, and a black-box problem that prevents code-level inspection.

STRIDE gives you the categories. ATLAS gives you the techniques. Together they provide the shared vocabulary — STRIDE tells you what type of threat you're looking at; ATLAS tells you how attackers execute it and what mitigations to apply.

OWASP tells you where to point the camera. The LLM Top 10 maps risks directly to architectural components, enabling the bidirectional lookup that turns a reference document into an active assessment tool. The question "we're deploying a vector database — what risks does that inherit?" should have a specific, actionable answer before that database goes to production.

The frameworks evolve — ATLAS adds new techniques as new attack classes are documented, OWASP updates its list as the deployment landscape shifts — but the methodology stays consistent. Inventory your assets, map your supply chain, apply STRIDE with AI-specific context, enrich with ATLAS, locate with OWASP. Run this process every time your organization deploys a new AI system, updates a model, or introduces agentic capabilities.

The risk is real. The techniques are documented. The frameworks exist. The gap is in applying them.
← prev: dependency supply chain security