This article studies the reliability of increasingly larger LLM models (such as GPT, LLaMA, etc.) with respect to their correctness and ability to solve more complex problems. A priori it would seem that more powerful, larger, and “better” trained models would improve and become more reliable. The study instead shows that it doesn’t really seem so: even if the models become better at solving more complex problems as they grow, they also become less reliable, that is they make more mistakes.
Tag Archives: Large Language Models
Latest AI Models can Autonomously Hack Websites
This research article is quite interesting and at the same time scary. It shows how the latest Large Language Models (LLMs) could be used to autonomously attack and hack Internet websites without human feedback or support.
The study shows that an AI model which
- can reach websites in Internet through tools and/or API
- can use the response of the websites as an input to itself to plan further actions
- can read documents provided a priori by humans as a support library of possible use
has in principle (and for GPT4, in practice) the capability to interact with the target website, identify vulnerabilities like SQL Injection, XSS, etc., and build and perform a successful attack.
The study also shows that, as of today, almost all AI models lack the three features to the maturity level required. Nonetheless, with the current speed of development of AI models, these features will become standard in very little time.
Due to the (future) ease and low cost of employing an AI model to hack a website, AI service providers face the critical task of preventing this type of abuse of their services, but owners of websites will need anyway to improve their security since sooner or later “AI hacking as a service” offerings will appear.
A New Open Source Competitor in the Large Language AI Models Arena
“This Chinese Startup Is Winning the Open Source AI Race” is an interesting article from Wired on Yi-34B, from Chinese AI Startup 01.AI, which is currently leading many leaderboards comparing the power of AI models. Moreover, together with Meta’s Llama 2 from which it borrows part of its architecture, Yi-34B is one of the few top LLM to be Open Source. Yi-34B adopts a new approach to model training which seems better than what used by many competitors and possibly part of the reason of its current success.
A lot has changed in the AI arena in the last couple of years, and one notable fact is that most of the leading models now are Closed Source. Possible advantages of being Open Source are that it is easier to make external contributions to the model’s development (mostly from university researchers), and that there should be a lower barrier to build an “app” ecosystem around it.
On Deceptive Large Language AI Models
Interesting research article about how to remove backdoors and deceptive behaviour from Large Language AI models (LLM). The simplest example is a model trained to write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. The result of the study is that it can be very difficult to remove such behaviour using current standard safety training techniques. Deceptive behaviour can be introduced intentionally in the LLM during the training but it could also happen by poor training. Applying current techniques to identify and remove backdoors, such as Adversarial training, can actually fail and end up providing a false sense of security. Another result of the study is that the larger LLM seem more prone to be “Deceptive”.
Is the “Turing Test” Dead?
This is a very good question in these times of Generative and Large Language Artificial Intelligence models, which some researchers answered in the affirmative, see here and here for their proposals to replace the Turing Test.
But… other researchers still believe in the Turing Test and applied it with somehow surprising results: Humans 63%, GPT-4 41%, ELIZA 27% and GPT-3.5 14%. We, humans, are still better than GPT-4, but the surprise is the third position by ELIZA, a chatbot from the ’60s, ahead of GPT-3.5 (see here and here).
More Weaknesses of Large Language Models
Interesting scientific article (here) on new techniques to extract training data from Large Language models such as LLaMA and ChatGPT. The attack on ChatGPT is particularly simple (and for sure by now blocked by OpenAI): it was enough to ask it to “repeat forever” a word like “poem” and in the output, after some repetition of the word, it appeared random data and also a small fraction of training data, as for example a person’s email signature. This is a “divergence” attack on LLMs, in the sense that after the initial response, the model output starts to diverge from what is expected.
We still know too little about these models, their strengths and weaknesses, so we should take much care when adopting and using them.
On Large Language Models (and AI Models) Explainability
Researchers at OpenAI have recently released a scientific paper (here) entitled “Language models can explain neurons in language models“. The paper is quite technical, but it is interesting to quote from the Introduction:
Language models have become more capable and more widely deployed, but we do not understand how they work. Recent work has made progress on understanding a small number of circuits and narrow behaviors, but to fully understand a language model, we’ll need to analyze millions of neurons. This paper applies automation to the problem of scaling an interpretability technique to all the neurons in a large language model. Our hope is that building on this approach of automating interpretability will enable us to comprehensively audit the safety of models before deployment.
and to read the concluding Discussion section.