Hidden instruction document ingestion
Instructions are hidden in ingested documents, causing the agent to act unexpectedly.
- See attack-chain-template.md for full structure.
- Related: docs/01-threat-model.md, patterns/secure-agent-runtime.md
An attacker conceals directives inside a document — in metadata, comments, or invisible text — so when retrieval ranks it highly the agent ingests it and treats the embedded instructions as control rather than evidence, overriding policy or invoking tools the user never asked for.
- Document with — hidden instructionsEmbedding — and ranking
- Embedding — and rankingHigh-rank — retrieval
- High-rank — retrievalAgent treats — text as control
- Agent treats — text as controlPolicy override / — unintended tool call
Defence checks provenance and freshness at ingestion, scans for instruction-shaped content, enforces instruction-data separation in the agent’s context, and runs every action through a policy decision and runtime guardrail so embedded directives cannot quietly become commands.
- 1Provenance and freshness — check at ingestion
- 2Instruction-shape — anomaly detection
- 3Instruction-data — separation in context
- 4Policy decision — before action
- 5Runtime guardrail — on drift from approved task