Generative AI: Creativity vs. Knowledge

Anthropic published a quite interesting blog article, “Designing AI-resistant technical evaluations,” worth reading. Quoting from the article, “I needed a problem where human reasoning could win over Claude’s larger experience base: something sufficiently out of distribution. […] I implemented one medium-hard puzzle and tested it on Claude Opus 4.5. It failed. I filled out more puzzles and had colleagues verify that people less steeped in the problem than me could still outperform Claude.

 

It should not come as a surprise that current Generative AI models have a larger knowledge base than any single one of us humans, but when creativity and deep reasoning are required to solve a puzzle, we can still beat them.

“VoidLink: Evidence That the Era of Advanced AI-Generated Malware Has Begun” by Check Point Research

I am just quoting the first Key Point from this Check Point Research blog article:

Check Point Research (CPR) believes a new era of AI-generated malware has begun. VoidLink stands as the first evidently documented case of this era, as a truly advanced malware framework authored almost entirely by artificial intelligence, likely under the direction of a single individual.

Though quite technical, I recommend its reading to anyone involved or just interested in the interplay of Cybersecurity and Artificial Intelligence.

Are AI Coding Assistants Declining in Quality?

AI Coding Assistants Are Getting Worse” is an intriguing article on IEEE Spectrum. According to the author (Jamie Twiss, CEO of Carrington Labs), the quality of the AI Coding assistants is declining contrary to what it may look at first sight.

Indeed, the author  noticed that some of the most recent AI models produce, more often than previous models, code which runs but which fails to perform as intended, even when given wrong instructions which cannot lead to a running code. In the opinion of the author, this can be due to the quality of the large volumes of training data needed by the latest models, and on the direct interaction of the AI Assistants with the users, which can “push” the AI models to produce code which runs.