Writing Outline

Draft Outline

  • Separate retrieval quality, generation quality, and agent task-completion quality
  • Track source coverage, unsupported claims, tool-use errors, and recovery behavior
  • Use golden sets, adversarial questions, traces, and human review rubrics
  • Connect evaluation to deployment gates, monitoring, and regression testing