

Completely irrelevant. The title and posted article are talking about unintentionally training LLM text generation models with prior output of other AI models. Not having enough training data for other types of models is a completely different problem and not what the article is about.
Nobody is going to "trawl the web for new data to train their next models” (to quote the article) for a model trying to cure diseases.
In the third paragraph you mentioned “tux” but I’m guessing that you meant “tmux”. Just a clarification for readers that might not be familiar with it and want to look it up.