Andrea Pasquinucci

A blog covering ICT, Security and Technology

Anthropic: “Reasoning models don’t always say what they think”

Posted on 2025-04-10 by Andrea Pasquinucci

Interesting article by Anthropic, it seems that there is still a lot to understand before reaching “Explainable AI”. Quoting: “our results point to the fact that advanced reasoning models very often hide their true thought processes, and sometimes do so when their behaviors are explicitly misaligned.“