Tag: rl

All the articles with the tag "rl".

GEPA: How an LLM Can Write a Better Prompt Than RL Can Train One

A walkthrough of GEPA (Agrawal et al., ICLR 2026), the reflective prompt optimiser that beats GRPO with up to 35× fewer rollouts by reading its own trace logs in plain English. The four-step loop, a worked iteration on a multi-hop QA system, the Pareto trick that keeps the candidate pool diverse, and where 98% of the rollout budget actually goes.

Published: 20 Apr, 2026
· llm / dspy / prompt-optimization