Tag: evals
All the articles with the tag "evals".
-
Breaking Down Agent Evals: A Practitioner's Guide
Published:Part 1 of a 3-part series. Why traces (not code) are the source of truth in agents, the three observability primitives, run types, the metrics that matter at each level, the pass^k reliability metric, a four-step methodology for building an eval suite, and a filter funnel approach to why no single eval method is enough.