Skip to content

Landscape map

The security boundary for agentic AI is the execution system around the model: the prompts it receives, the context it retrieves, the tools it can call, the credentials it can use, the memory it can update, the code it can write or run, the approvals it can request, and the downstream systems it can affect.

This landscape map gives readers a common frame before moving into the threat model. It treats agentic systems as execution environments, not as isolated model endpoints.

The diagram below shows the agentic execution system as five stacked layers, with the control posture wrapping every layer.

  1. 1Control posture: Observe → Interpret → Constrain → Audit
  2. 2Inputs: instructions, retrieved context, memory
  3. 3Agent reasoning
  4. 4Action layer: tools, MCP, code, credentials, approvals
  5. 5Downstream systems and assurance evidence

The Protected Object

In a model-centred system, the protected object is often the prompt, the completion, or the data sent to and from the model. In an agentic system, the protected object is broader: it is the system of action that forms around language, tools, state, identity, and authority.

ComponentSecurity question
InstructionsWhich messages, prompts, policies, and delegated goals can shape behaviour?
ContextWhich retrieved documents, data sources, and conversation state influence decisions?
ToolsWhich functions, APIs, systems, files, and workflows can the agent invoke?
CredentialsWhich user, service, or delegated authority does the action use?
MemoryWhich facts, preferences, summaries, and learned state can persist across turns or sessions?
Code executionWhich generated scripts, shell commands, notebooks, or automation paths can change systems?
ApprovalsWhich actions require human review, and what evidence does the reviewer see?
Downstream systemsWhich repositories, SaaS platforms, cloud resources, data stores, and communication channels can be changed?

The central question is therefore not only whether a model output is safe. It is what the agentic system can do, under whose authority, with which context, through which tools, and under what controls.

Language As An Execution Layer

Language becomes part of the execution layer when instructions can change action. A prompt, retrieved document, issue comment, ticket, email, chat message, web page, or tool response may influence whether an agent reads data, calls an API, writes code, updates memory, opens a pull request, sends a message, or changes configuration.

This does not mean language is executable in the same way as a binary or script. It means language can participate in execution paths by steering systems that have authority. Security therefore needs to inspect more than text safety. It needs to understand the relationship between instruction, intent, authority, tool choice, data sensitivity, and outcome.

Useful control questions include:

  • Which instructions are trusted, untrusted, system-owned, user-owned, or retrieved from external sources?
  • Which instructions can override, redirect, or reinterpret the user’s goal?
  • Which tool calls or memory writes can be triggered by language alone?
  • Which actions require policy checks, approval gates, sandboxing, or credential brokering before execution?

Main Risk Surfaces

Agentic risk appears wherever language, authority, state, and action meet.

SurfaceFailure focusControl question
Instruction flowUntrusted instructions influence goals, policies, or tool choices.Which instruction sources are allowed to steer behaviour, and which must be treated as data?
Retrieved contextExternal or stale context changes interpretation, priorities, or decisions.How is context sourced, labelled, filtered, and bounded before use?
Tool interfacesTools expose unsafe actions, broad parameters, weak validation, or risky composition.What can each tool do, and how is intent, authority, input, and output checked?
Credentials and tokensAgents act with excessive, unclear, or poorly scoped authority.Which identity is used for each action, and can credentials be limited per task?
MemoryPersistent state stores manipulated facts, preferences, summaries, or instructions.What may be written to memory, who can influence it, and how is it reviewed or expired?
Code and automationGenerated code, scripts, or file operations create side effects beyond the reviewed output.Where can code run, what can it touch, and what evidence is preserved?
MCP, skills, and extensionsTool servers or packaged capabilities become authority-bearing execution boundaries.How are capabilities discovered, trusted, configured, monitored, and revoked?
Human approvalsReviewers approve actions without enough context, risk signal, or diff visibility.What must a reviewer see before approving an action?
Multi-agent communicationOne compromised agent influences another agent, queue, workflow, or shared memory.How are messages authenticated, scoped, and constrained across agent boundaries?
Monitoring and evaluationLogs, traces, benchmarks, and tests miss multi-step behaviour and downstream effects.Can the organisation observe action paths, not only final responses?

These surfaces overlap. A retrieved document can influence a tool call. A tool response can update memory. A memory entry can affect a later approval request. A token can turn a weak instruction into an organisational action.

How Failures Compose

Most agentic failures are not single events. They are paths across the execution system.

  1. Untrusted instructionCompromised intent
  2. Compromised intentRisky tool selection
  3. Risky tool selectionDelegated authority use
  4. Delegated authority usePersistent state or downstream change
  5. Persistent state or downstream changeBroader organisational impact

Different systems will expose different paths, but the pattern is consistent: a weak boundary in one place becomes more serious when it is connected to tools, credentials, memory, automation, or other agents.

Common composition patterns include:

  • An instruction attack changes how the agent interprets the user’s goal, then tool access turns that interpretation into action.
  • Poisoned context changes the evidence base, then memory makes the change persistent across sessions.
  • Broad credentials allow a narrow task to affect systems outside the user’s intended scope.
  • A weak approval path lets a human approve an action without seeing the instruction source, tool parameters, or likely impact.
  • A compromised tool, skill, or MCP server becomes a bridge between language-level influence and system-level side effects.
  • In a multi-agent workflow, one agent’s manipulated output becomes another agent’s trusted input.

Phase 5 will expand these paths into detailed breach-chain models. This phase establishes the shared landscape and vocabulary.

Control Posture

Securing agentic systems requires controls that operate across the execution environment:

Control postureWhat it must cover
ObservePrompts, retrieved context, tool calls, credential use, memory reads and writes, approvals, outputs, and downstream actions.
InterpretIntent, instruction source, authority, data sensitivity, tool risk, policy fit, and likely outcome.
ConstrainTool permissions, credential scopes, memory writes, code execution, data movement, approval gates, and autonomous actions.
AuditEvidence for review, incident response, assurance, governance, and continuous improvement.

The threat model turns this landscape into specific failure modes and defender questions.