Using model-generated content in training causes irreversible defects, a team of researchers says. “The tails of the original content distribution disappears,” writes co-author Ross Anderson from the University of Cambridge in a blog post. “Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions.”
Here’s is the study: http://web.archive.org/web/20230614184632/https://arxiv.org/abs/2305.17493
This reminds me of a saying from my programming classes: Garbage in, garbage out. Refers to how inputting bad data WILL make the program produce even more bad data