AI-Assisted Engineering: When a Better Model Becomes a Worse Tool

I tested Fable 5 on two coding tasks in parallel with Claude 4.8 and ChatGPT. In those tests, Fable 5 was slightly better. It understood the work, produced strong results, and looked like a genuine step forward.

But two successful tasks measure capability, not operational reliability. I have also seen repeated reports of Fable 5 degrading during real workflows, wasting tokens, breaking pipelines, and consuming time without producing usable results. Some users eventually rolled back to 4.8. At that point the newer model was not merely weaker. It was worse than useless because its output created additional work.