M.G. Siegler •

Meta's Only Slightly Better Than Burning Books Plan...

How Tech Giants Cut Corners to Harvest Data for A.I.
Meta debated buying a publisher like Simon & Schuster for AI training data...

There is a lot in this NYT report – bylined by five reporters: Cade Metz, Cecilia Kang, Sheera Frenkel, Stuart A. Thompson and Nico Grant – about how various companies are using data to build their AI models – so much so that the publication has posted a few of its own summaries on the story already. But one bit stands out to me above all else:

At Meta, which owns Facebook and Instagram, managers, lawyers and engineers last year discussed buying the publishing house Simon & Schuster to procure long works, according to recordings of internal meetings obtained by The Times. They also conferred on gathering copyrighted data from across the internet, even if that meant facing lawsuits. Negotiating licenses with publishers, artists, musicians and the news industry would take too long, they said.

I mean what the literal – and I do mean literal, literally – fuck? I sure hope this was some off-handed comment made in a meeting and was not taken seriously. But apparently the NYT has a recording of at least one meeting where this was mentioned. And well, it sounds like there was an actual discussion on this proposal – and/or around paying $10 a book for the full licensing rights to new titles in order to ingest them into their AI systems.

Ladies and gentleman, we have now reached peak farcical dystopia. Just imagine Meta buying one of the largest book publishers, not as a money-making endeavor, or even as some sort of misguided goodwill effort after years of pulling-the-football from various professional writers, but purely to get access to the data in said books. Not beautiful prose. Data. I mean, if there's beautiful prose, it probably helps Meta because good writing teaches AI to write better. But they also likely want some crap writing too, for people who wish to write that way.

Next, perhaps Sherwin-Williams will buy the Louvre in order to access some paint found in the stuff hanging on the walls there.

Perhaps Meta was just looking around and noticing that they've been riding high, both in the stock market and in public perception in recent months while rivals like Apple are the ones being dragged. "Hold my beer," says Meta and hatches a plan that will immediately vault them back into the negative headlines...1

It's hard to come up with a more innovative way to create bad publicity than this idea of buying a book publisher to use its library to train AI. Can Operation: Kick Puppies be far behind?


1 Yeah, I mean practically speaking we get why this idea was in someone's head. All of these systems need more data, and the "better" the data -- meaning higher quality in terms of writing because it would have been editing and published -- the better. But come on, no company, let alone a tech company, can utter, even in an internal meeting, such a proposal in this environment with a straightface. Beyond the regulatory concerns, there might well be riots around this notion. Optically, it would be the worst proposed plan perhaps in all of human history.