Synthetic data now makes up the majority of the Internet, having surpassed Peak Data in 2024 (a phenomenon reflecting a saturation of the high-quality data available to train AI LLMs), reaching a ...