Schema A · answer-first

The model commits to a value before it has done any work. Anything in the reasoning field is post-hoc rationalization.

class Solution(BaseModel):
    answer: int
    reasoning: str
Sample generation {
  "answer": 7,
  "reasoning": "because 3+4 equals 7"
} ↑ reasoning has to justify whatever number was guessed first

Schema B · reasoning-first

The model writes its scratchpad in the JSON, then the answer is conditioned on the scratchpad it just produced.

class Solution(BaseModel):
    reasoning: str
    answer: int
Sample generation {
  "reasoning": "3+4=7, then ×2 = 14",
  "answer": 14
} ↑ answer is conditioned on the reasoning that just streamed
4.5%answer-first accuracy[reported]
95%reasoning-first accuracy[reported]
1field re-ordered