A New Open Source Competitor in the Large Language AI Models Arena

This Chinese Startup Is Winning the Open Source AI Race” is an interesting article from Wired on Yi-34B, from Chinese AI Startup 01.AI, which is currently leading many leaderboards comparing the power of AI models. Moreover, together with Meta’s Llama 2 from which it borrows part of its architecture, Yi-34B is one of the few top LLM to be Open Source. Yi-34B adopts a new approach to model training which seems better than what used by many competitors and possibly part of the reason of its current success.

A lot has changed in the AI arena in the last couple of years, and one notable fact is that most of the leading models now are Closed Source. Possible advantages of being Open Source are that it is easier to make external contributions to the model’s development (mostly from university researchers), and that there should be a lower barrier to build an “app” ecosystem around it.

Writing (in-) Secure Code with AI Assistance

This is an interesting research article on the security of code written with AI Assistance; the large-scale user study shows that code written with an AI Assistant is usually less secure, that is contains more vulnerabilities, than code written without AI support.

Thus, at least as of today, relying on an AI Assistant to write better and more secure code could work out badly. But AI is changing very rapidly, soon it could learn math and to write secure and super efficient code. We’ll see…

Is the “Turing Test” Dead?

This is a very good question in these times of Generative and Large Language Artificial Intelligence models, which some researchers answered in the affirmative, see here and here for their proposals to replace the Turing Test.

But… other researchers still believe in the Turing Test and applied it with somehow surprising results: Humans 63%, GPT-4 41%, ELIZA 27% and GPT-3.5  14%. We, humans, are still better than GPT-4, but the surprise is the third position by ELIZA, a chatbot from the ’60s, ahead of GPT-3.5 (see here and here).

More Weaknesses of Large Language Models

Interesting scientific article (here) on new techniques to extract training data from Large Language models such as LLaMA and ChatGPT. The attack on ChatGPT is particularly simple (and for sure by now blocked by OpenAI): it was enough to ask it to “repeat forever” a word like “poem” and in the output, after some repetition of the word, it appeared random data and also a small fraction of training data, as for example a person’s email signature. This is a “divergence” attack on LLMs, in the sense that after the initial response, the model output starts to diverge from what is expected.

We still know too little about these models, their strengths and weaknesses, so we should take much care when adopting and using them.

New US Executive Order on Artificial Intelligence

Due to the US leading role in AI/ML development, it is of interest that President Biden issued an Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence (here a Fact Sheet). By quickly glancing at it, the order requires that:

  • Developers of the most powerful AI systems share their safety test results and other critical information with the U.S. government;
  • Standards, tools, and tests to help ensure that AI systems are safe, secure, and trustworthy are developed;
  • There are protections against the risks of using AI to engineer dangerous biological materials by developing strong new standards for biological synthesis screening;
  • Americans are protected from AI-enabled fraud and deception by establishing standards and best practices for detecting AI-generated content and authenticating official content;
  • An advanced cybersecurity program to develop AI tools to find and fix vulnerabilities in critical software is established.

These are very high-level goals, and we need that they are achieved not only in the US but worldwide (see eg. also the upcoming EU AI Act).

AI Transparency not doing so well

Stanford University researchers just released a report in which a “Foundation Model Transparency Index” (here) is presented. The first evaluation did not go so well, since the highest score is 54 out of 100. Comments by reviewers and experts in the field point out that “transparency is on the decline while capability is going through the roof” as Stanford CRFM Director Percy Liang told Reuters in an interview (see also here).

Compliance of Foundation AI Model Providers with Draft EU AI Act

Interesting study by the Center for Reasearch on Foundation Models, Stanford University, Human-Centered Artificial Intelligence, on compliance of Foundation Model Providers, such as OpenAI, Google and Meta, with the Draft EU AI Act. Here is the link to the study, and the results indicate that the 10 providers analysed “largely do not” comply with the draft requirements of the EU AI Act.

AI and the Extinction of the Human Race

The Center for AI Safety (here) has just published the following “Statement on AI Risk” (here):

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

The list of signatures is impressive (just look at the first 4) and should make us think more deeply about us and AI & ML.

On Large Language Models (and AI Models) Explainability

Researchers at OpenAI have recently released a scientific paper (here) entitled “Language models can explain neurons in language models“. The paper is quite technical, but it is interesting to quote from the Introduction:

Language models have become more capable and more widely deployed, but we do not understand how they work. Recent work has made progress on understanding a small number of circuits and narrow behaviors, but to fully understand a language model, we’ll need to analyze millions of neurons. This paper applies automation to the problem of scaling an interpretability technique to all the neurons in a large language model. Our hope is that building on this approach of automating interpretability will enable us to comprehensively audit the safety of models before deployment.

and to read the concluding Discussion section.