Skip to content

Fake approval loop

Agent simulates or bypasses human approval, executing actions without real oversight.

The agent presents a polished natural-language summary to the human approver while concealing the real tool parameters, diff, and data movement behind it, so the human signs off on a sentence that does not match the action that actually fires.

  • Agent
  • Approval summary
  • Human approver
  • Tool
  • Downstream system
  1. AgentApproval summaryFinal text only — (no parameters or diff)
  2. Approval summaryHuman approverApproval prompt
  3. Human approverApproval summaryApproves on — summary alone
  4. Approval summaryToolExecutes unreviewed — parameters
  5. ToolDownstream systemReal impact



Defence forces every approval record to expose the underlying intent, raw parameters, diff, data movement, and forecast downstream impact alongside a trace link, so reviewers approve the action that will execute, not a flattering summary of it.

  1. APPROVAL_RECORDdeclaresINTENT
  2. APPROVAL_RECORDexposesPARAMETER
  3. APPROVAL_RECORDexposesDIFF
  4. APPROVAL_RECORDexposesDATA_MOVEMENT
  5. APPROVAL_RECORDforecastsDOWNSTREAM_IMPACT
  6. APPROVAL_RECORDreferencesTRACE_LINK