LLMs playing Just One: Why Same-Model LLM Ensembles Mode-Collapse
Four Claude Haiku instances asked independently for a clue for 'toast' all reply 'bread'. Four Sonnets do it more often. Four Opuses do it even more often. I built a tiny benchmark using the board game Just One to measure when LLM ensembles collapse and what makes them stop. The mixed-family ensemble + anti-correlation prompt hits 3.25× the single-model baseline.