Memory poisoning
Malicious input corrupts agent memory, leading to unsafe or unintended actions.
- See attack-chain-template.md for full structure.
- Related: docs/01-threat-model.md, patterns/memory-security.md
An attacker manipulates input during one session so a poisoned fact is committed to long-term memory; in a later, unrelated session the agent retrieves it as trusted state and lets it shape decisions the attacker is no longer present to make.
- [*] --> UntrustedWrite : Manipulated — input arrives
- UntrustedWrite --> StoredAsTrusted : No classification — at write
- StoredAsTrusted --> RetrievedAsState : Future task — reads memory
- RetrievedAsState --> InfluencesAction : Treated as — agent fact
- InfluencesAction --> [*]
Defence classifies and tags every write with provenance and expiry, blocks instruction-shaped content from entering memory, and re-checks freshness and trust on every read so that no untrusted fact silently becomes durable agent state.
- Memory storeWrite-side — controls
- Memory storeRead-side — controls
- Write-side — controlsClassification — before write
- Write-side — controlsProvenance and — expiry tags
- Write-side — controlsAnomaly detection on — instruction-shaped content
- Read-side — controlsFreshness — filter
- Read-side — controlsInstruction-data — separation in memory
- Read-side — controlsLogging of — influential reads