Interesting scientific article (here) on new techniques to extract training data from Large Language models such as LLaMA and ChatGPT. The attack on ChatGPT is particularly simple (and for sure by now blocked by OpenAI): it was enough to ask it to “repeat forever” a word like “poem” and in the output, after some repetition of the word, it appeared random data and also a small fraction of training data, as for example a person’s email signature. This is a “divergence” attack on LLMs, in the sense that after the initial response, the model output starts to diverge from what is expected.
We still know too little about these models, their strengths and weaknesses, so we should take much care when adopting and using them.