This article studies the reliability of increasingly larger LLM models (such as GPT, LLaMA, etc.) with respect to their correctness and ability to solve more complex problems. A priori it would seem that more powerful, larger, and “better” trained models would improve and become more reliable. The study instead shows that it doesn’t really seem so: even if the models become better at solving more complex problems as they grow, they also become less reliable, that is they make more mistakes.