
Hachette and Cengage are joining the lawsuit against Google for scraping books to train Gemini – this could rewrite AI training rules overnight.
Boom – publishers like Hachette and Cengage just crashed Google’s AI party, demanding into a lawsuit over ripping off their books for Gemini training. Calling it ‘one of the most prolific infringements in history,’ they’re gunning for massive damages.[4]
As devs, this isn’t abstract drama. Google’s been hoovering data like it’s free candy, but if courts side with creators, fine-tuning on public datasets gets dicey. I’ve relied on open corpora for models; suddenly, everything from textbooks to code repos could need licenses. Meta’s DeepConf and others might dodge via confidence tricks, but this tests the whole ecosystem.[1][4]
My take: Good – unchecked scraping fueled the boom but bred entitlement. Practical move? Start auditing your training data now, pivot to licensed sets like those Wikipedia might sell post-deal.[2] Winners will be synthetic data generators. What datasets are you using, and how locked are you in?
Source: Insurance Journal