Red teaming & evaluation
This document provides practical guidance for evaluating agentic AI systems, including red teaming approaches, evaluation scenarios, and evidence requirements.
Contents
- Why evaluation is different for agentic systems
- Red teaming methodology
- Example evaluation scenarios
- Evidence requirements and reporting
- Limitations and open questions