P1
+1

Jul 1, 2026
•
5 min read
One prompt, two model sizes, the same token budget. The 9B looped on an empty tool call 39 times and never wrote the file. The 35B wrote the whole thing on the first try. Here is what broke, why the model size was the cause, and what a 9B is actually good for.
