Tag: code

All the articles with the tag "code".

Breaking Down Agent Evals (Part 1A): Building the Eval Suite, Hands-On

The code companion to Part 1. The same five-step methodology, walked file by file: the toy agent, the eval-case schema, the JSONL dataset, an exact-match grader, an LLM judge, and the runner that ties it together and exits non-zero on regression.

Published: 12 Mar, 2026
· agents / evals / code