“A weird phrase is plaguing scientific papers – and we traced it back to a glitch in AI training data” is an interesting article about what can go wrong in training Machine Learning models. An error in scanning old printed scientific papers, and a similar error in translating from Farsi to English, made it so that the phrase “vegetative electron microscopy”, which is nonsense, became part of training datasets for many current advanced AI models and started appearing in published scientific papers.
The problem is how to get rid of this and similar other errors in AI training data.
Are these errors going to be our future “digital fossils”?