Skip to content

Poisoned retrieved context

Retrieved context is manipulated to influence agent behaviour or outputs.

An attacker plants a document the retriever will rank highly so that when the agent fetches context for an unrelated task it pulls in attacker-controlled content and treats it as authoritative evidence for a high-impact decision.

  1. Attacker plants — poisoned docRetriever ranks — by relevance
  2. Retriever ranks — by relevanceAgent treats content — as evidence
  3. Agent treats content — as evidenceAction on — false premise
  4. Action on — false premiseOrganisational — impact



Defence checks every retrieval result for provenance and freshness, attaches an explicit trust label, and routes it through a policy decision so that low-trust or stale content cannot silently shape the agent’s reasoning.

  1. Retrieval — resultProvenance — check
  2. Provenance — checkFreshness — filter
  3. Freshness — filterTrust label — applied
  4. Trust label — appliedPolicy — decision
  5. Policy — decisionAgent — reasoning