Defense architecture

Agentic systems need defence architecture because their risk is not confined to a model response. Prompts, retrieved context, tool calls, credentials, memory, code execution, approvals, and downstream systems can all participate in action.

This document builds on the attack surface map and agentic attack chains. It organises the controls that help teams observe, interpret, constrain, and audit behaviour as it unfolds.

The aim is not to describe a single product pattern. It is to define the control layers a secure agentic execution system should make explicit.

Architecture Principle

The core architectural principle is:

No meaningful action should happen without enough context to understand the intent, authority, risk, policy fit, and expected outcome.

That principle applies before a tool call, during execution, after a result returns, and when the system writes memory or affects a downstream system.

The defence architecture should answer five questions:

Question	Why it matters
What is influencing the agent?	Prompts, retrieved context, tool results, memory, and other agents can all shape behaviour.
What can the agent do?	Tools, workflows, code execution, and downstream systems define the action surface.
Under whose authority?	User sessions, service accounts, delegated tokens, and approvals determine blast radius.
Which policy applies?	Risk depends on intent, data sensitivity, tool capability, identity, and likely outcome together.
What evidence remains?	Governance, incident response, assurance, and improvement require a reconstructable action path.

Control Loop

The runtime security model has four connected capabilities.

Capability	What it does	Evidence it should preserve
Observe	Captures prompts, context, memory reads and writes, tool calls, approvals, outputs, and downstream actions.	Source labels, trace identifiers, selected context, tool parameters, effective identity, action result, and downstream change.
Interpret	Assesses user intent, instruction source, data sensitivity, tool risk, delegated authority, policy fit, and likely impact.	Risk factors, matched rules, confidence or uncertainty, policy decision, and explanation shown to reviewers.
Constrain	Limits action through policy decisions, tool brokers, credential brokers, sandboxing, approval gates, and outcome controls.	Allow, deny, revise, require approval, narrow scope, or rollback decision with reason.
Audit	Preserves the chain from influence to outcome for review, assurance, incident response, and continuous improvement.	Linked prompt, context, decision, credential, approval, tool, memory, output, and downstream records.

These capabilities should not be isolated systems. Observation without interpretation becomes storage. Interpretation without constraint becomes advice. Constraint without audit becomes hard to trust. Audit without runtime control arrives too late.

AI Defense Plane

The AI Defense Plane organises controls into three operating layers: Discover, Protect, and Govern.

Discover — Inventory and ownershipAgentic execution system — agents · context · tools · memory · authority · downstream
Protect — Runtime decisions and controlsAgentic execution system — agents · context · tools · memory · authority · downstream
Govern — Authority, evidence, accountabilityAgentic execution system — agents · context · tools · memory · authority · downstream
Agentic execution system — agents · context · tools · memory · authority · downstreamAudit and assurance evidence

Layer	Responsibility	Examples
Discover	Find and classify agents, tools, prompts, data flows, credentials, memory, workflows, owners, and downstream systems.	Agent catalogue, tool registry, data-flow map, memory inventory, authority map, ownership record.
Protect	Control inputs, retrieved context, memory writes, tool calls, credentials, code execution, approvals, and autonomous action.	Runtime guardrails, policy decisions, tool brokers, credential brokers, sandboxing, approval gates, outcome controls.
Govern	Manage delegated authority, policy exceptions, audit trails, assurance evidence, accountability, and compliance obligations.	Review cadence, risk acceptance, audit evidence, evaluation evidence, exception handling, incident records.

Discover shows what exists. Protect decides and enforces what can happen. Govern makes authority, evidence, accountability, and improvement durable.

Layered Control Model

A secure agentic architecture needs several control layers that share context. Each layer should be designed as part of the action path, not added only at the final output.

Layer	Responsibility	Control question	Evidence to keep
Identity and access	Identify the user, agent, service, tool, workflow, and downstream system involved in an action.	Is this identity allowed to request this action for this task?	Effective identity, role, scope, owner, delegated authority, and session or token lifetime.
Policy decision	Evaluate intent, source trust, data sensitivity, tool risk, authority, and expected impact.	Should this action be allowed, denied, revised, or escalated for approval?	Matched rule, risk factors, decision, reason, exception, and reviewer-facing explanation.
Runtime guardrail	Inspect inputs, context, generated plans, tool parameters, outputs, and memory writes while execution is happening.	Is the current step still aligned with the approved task and policy?	Detection, transformation, blocked content, revised plan, and trace link.
Tool broker	Mediate tool, API, workflow, MCP server, skill, extension, file, and command execution.	Is this tool call valid, scoped, and expected for this task?	Tool schema, parameters, caller, policy result, result, side effect, and retry or rollback path.
Credential broker	Issue or bind credentials for a specific task, action, tool, and approval boundary.	Is the authority narrower than the approved outcome requires?	Credential type, scope, lifetime, binding, secret-handling decision, and revocation record.
Memory and context	Control retrieval, summarisation, context injection, memory reads, and memory writes.	Can this context or memory safely influence future action?	Source, provenance, freshness, sensitivity, trust level, owner, expiry, and write reason.
Observability and audit	Link prompts, context, decisions, tools, credentials, approvals, memory, outputs, and downstream effects.	Can a reviewer reconstruct what happened and why?	End-to-end trace, decision log, approval record, state change, outcome, and incident evidence.
Human approval	Require informed review for sensitive, irreversible, ambiguous, or high-impact actions.	Does the reviewer see enough evidence to approve the action responsibly?	Source context, risk summary, parameters, identity, expected effect, approver, timestamp, and decision.
Outcome control	Limit, verify, reverse, or contain downstream effects after action is attempted.	Did the result match the approved intent and acceptable impact?	Final state, validation result, notification, rollback, containment, and business owner record.

The reference architecture below shows the same layered control model as seven stacked components. The agent, tool, and memory attack flow sequence diagram already covers ordering.

1Inputs: user goal, retrieved context, memory
2Agent reasoning
3Policy decision and approval gate
4Tool broker and credential broker
5Tool runtime, MCP, or workflow
6Outcome control and downstream system
7Observability and audit

How The Layers Work Together

The layers are strongest when they operate as one action path:

A user goal or delegated task enters with source, identity, and scope.
Context and memory are retrieved with provenance, sensitivity, freshness, and trust labels.
The agent proposes a plan or tool call.
The policy layer interprets intent, authority, data sensitivity, tool risk, and likely impact.
The runtime guardrail checks whether the proposed step still matches the approved task.
The tool broker validates the tool, parameters, schema, and expected side effect.
The credential broker issues only the authority needed for the approved action.
A human approval gate receives source context, risk, identity, parameters, and expected effect when the action is sensitive or ambiguous.
The downstream action is executed, constrained, verified, and logged.
Memory writes and future context changes are reviewed as state changes, not as harmless notes.
Observability and audit link the full path from influence to outcome.

If any step cannot explain its decision, scope, or evidence, the system is difficult to govern.

Control Escalation

Not every action needs the same control strength. The architecture should escalate controls as risk rises.

Risk signal	Useful escalation
Untrusted or mixed-trust instruction source	Source labelling, instruction separation, and goal alignment check.
Sensitive retrieved context	Provenance, freshness, sensitivity labels, and stronger evidence requirements.
Write, delete, send, deploy, approve, or purchase capability	Policy decision, tool broker validation, scoped credential, and audit trace.
Broad or reusable authority	Credential brokering, short lifetime, task binding, and revocation path.
Durable memory or shared state write	Provenance, owner, reason, expiry, reviewer visibility, and deletion path.
Cross-agent hand-off	Origin, recipient, delegated scope, trust label, and linked trace.
Irreversible or high-impact outcome	Human approval, outcome verification, rollback or containment plan, and business owner record.

The important design choice is to evaluate risk from the relationship between intent, authority, data, tool, and outcome rather than from the model output alone.

Identity And Delegation

Authority in an agentic system rarely flows in a straight line from the user. A user delegates a task scope to an agent; the agent narrows that scope into sub-tasks for sub-agents or tool calls; a credential broker issues short-lived authority bound to each step. At every hand-off, the effective identity and the credential scope should be narrower than what came before — never broader.

User
Agent
Sub-agent
Credential broker
Tool

UserAgentDelegated authority — (approved task scope)
Effective identity = User-on-behalf-of-Agent. — Authority pinned to task scope.
AgentSub-agentSub-task with — narrowed scope
Effective identity = Agent-on-behalf-of-User. — Sub-scope is subset of task scope.
Sub-agentCredential brokerRequest credential — for tool call
Credential brokerCredential brokerPre-issuance — scope check
Credential brokerToolCredential bound — to credential scope
ToolSub-agentResult within — credential scope
Sub-agentAgentResult within — sub-scope
AgentUserOutcome within — task scope

Human Approval Gates

Human approval is a control only when the reviewer can see the evidence needed to make a decision. A button after a confident summary is not enough.

Approval prompts should show:

The original user goal or delegated task.
The source and trust level of influential context.
The proposed action, parameters, and downstream system.
The effective identity and credential scope.
The matched policy, risk factors, and uncertainty.
The expected effect, rollback path, and business owner where relevant.

Approval should be required for actions that are sensitive, irreversible, outside normal scope, ambiguous, high-impact, or dependent on weak evidence.

The lifecycle of an approval request moves through six explicit states.

[*] --> Requested : Agent proposes — sensitive action
Requested --> EvidenceAssembled : Source context, risk summary, — parameters, identity, expected effect attached
EvidenceAssembled --> UnderReview : Approval prompt — shown to reviewer
UnderReview --> Approved : Reviewer signs off — (approver identity + timestamp)
UnderReview --> Denied : Reviewer rejects — with reason
UnderReview --> Revised : Reviewer requests — changes
Revised --> Requested : Agent re-proposes — within scope
Approved --> Logged : Approval record — linked to trace
Denied --> Logged : Decision record — linked to trace
Logged --> [*]

Outcome Control

Access control decides whether an actor may attempt an action. Outcome control checks whether the resulting state is acceptable.

Agentic systems need both. A tool call may be authorised but still produce an unsafe result because the context was stale, the parameters were wrong, a downstream system behaved unexpectedly, or the action combined with other steps.

Outcome controls can include:

Dry runs, previews, diffs, and staged execution.
Post-action validation against the approved intent.
Rate limits, spending limits, blast-radius limits, and data-transfer limits.
Rollback, quarantine, or compensating actions.
Alerts when the observed result differs from the expected result.

This is where the architecture connects access control with organisational impact.

Observability And Audit Trail

The audit layer is what makes an agentic system reconstructable. A single trace identifier should bind the full set of stage records produced during a task — prompt, context, decision, credential, approval, tool call, memory, output, and downstream effect — so a reviewer can follow the chain from influence to outcome without stitching logs across systems.

TRACEbindsPROMPT_RECORD
TRACEbindsCONTEXT_RECORD
TRACEbindsDECISION_RECORD
TRACEbindsCREDENTIAL_RECORD
TRACEbindsAPPROVAL_RECORD
TRACEbindsTOOL_CALL_RECORD
TRACEbindsMEMORY_RECORD
TRACEbindsOUTPUT_RECORD
TRACEbindsDOWNSTREAM_RECORD

Architecture Review Questions

Use these questions when reviewing an agentic system:

Can the system distinguish trusted control instructions from untrusted data?
Can reviewers see which context, memory, or tool result influenced the action?
Is every tool call bound to a user goal, identity, policy decision, and expected outcome?
Are credentials scoped to the task, short-lived, and auditable?
Are memory writes treated as state changes with provenance, owner, reason, and expiry?
Do approval gates show source context, risk, parameters, identity, and expected effect?
Can cross-agent messages preserve origin, trust level, delegated scope, and downstream action?
Can the system deny or revise actions when intent, authority, policy, or outcome is unclear?
Can the organisation reconstruct the path from influence to outcome after an incident?
Do evaluations test multi-step behaviour across tools, memory, approvals, and downstream systems?

If these answers are weak, the architecture may appear controlled at the model boundary while remaining exposed across the execution system.

Engineering Patterns Per Control Layer

Each control layer above is the architectural placeholder for one or more secure engineering patterns. The patterns describe the same controls at the boundary, decision, audit, and deny or revise level engineers can build to. Use the secure engineering patterns overview for the full map; the table below is the quick lookup.

Layer	Pattern(s) that implement it
Identity and access	Credential And Token Boundaries
Policy decision	Secure Agent Runtime, Secure Tool Calling
Runtime guardrail	Secure Agent Runtime
Tool broker	Secure Tool Calling, Secure MCP
Credential broker	Credential And Token Boundaries
Memory and context	Memory Security
Observability and audit	Secure Agent Runtime; audit evidence sections in all five patterns
Human approval	Secure Agent Runtime, Secure Tool Calling, Credential And Token Boundaries
Outcome control	Secure Tool Calling, Secure Agent Runtime

The secure engineering patterns overview diagram shows the same composition on a single picture: where this layered model places the decision points, the patterns name the engineering boundary that owns them.

Relationship To The Field Guide

This architecture turns the earlier risk map into a defence model. The landscape map explains why the boundary has moved, the threat model names the failure modes, the attack surface map shows where risk enters, and agentic attack chains show how local weaknesses compose.

The defence architecture explains where controls should sit so those chains can be interrupted, investigated, and improved.