Mark Zuckerberg: publishers ‘overestimate the value’ of their work for training AI
Source: The Verge
Meta CEO Mark Zuckerberg says there are complex copyright questions around scraping data to train AI models, but he suggests the individual work of most creators isn’t valuable enough for it to matter. In an interview with The Verge deputy editor Alex Heath, Zuckerberg said Meta will likely strike “certain partnerships” for useful content. But if others demand payment, then — as it’s done with news outlets — the company would prefer to walk away.
“I think individual creators or publishers tend to overestimate the value of their specific content in the grand scheme of this,” Zuckerberg said in the interview, which coincides with Meta’s annual Connect event. “My guess is that there are going to be certain partnerships that get made when content is really important and valuable.” But if creators are concerned or object, “when push comes to shove, if they demanded that we don’t use their content, then we just wouldn’t use their content. It’s not like that’s going to change the outcome of this stuff that much.”
Meta, like nearly every major AI company, is currently embroiled in litigation over the limits of scraping data for AI training without permission. Last year, the company was sued by a group of authors including Sarah Silverman, who claimed its Llama model was unlawfully trained on pirated copies of their work. (The case currently isn’t going great for those authors; last week, a judge castigated their legal team for being “either unwilling or unable to litigate properly.”)
The company — again, like nearly every major AI player — argues that this kind of unapproved scraping should be allowed under US fair use law. Zuckerberg elaborates on the question:
I think that in any new medium in technology, there are the concepts around fair use and where the boundary is between what you have control over. When you put something out in the world, to what degree do you still get to control it and own it and license it? I think that all these things are basically going to need to get relitigated and rediscussed in the AI era.
The history of copyright is indeed a history of deciding what control people have over their own published works. Fair use is designed to let people transform and build on each other’s creations without permission or compensation, and that’s very frequently a good thing. That said, some AI developers have interpreted it far more broadly than most courts. Microsoft’s AI CEO, for instance, said earlier this year that anything “on the open web” was “freeware” and “anyone can copy it, recreate with it, reproduce with it.” (This is categorically legally false: content posted publicly online is no less protected by copyright than any other medium, and to the extent you can copy or modify it under fair use, you can also copy or modify a book, movie, or paywalled article.)
Meanwhile, some artists have turned to unofficial tools that would prevent their work from being used for AI training. But especially for anything posted on social media before the rise of generative AI, they’re sometimes stymied by terms of service that let these companies train on their work. Meta has stated that it trains its AI tools on public Instagram and Facebook posts.
Zuckerberg said Meta’s future AI content strategy would likely echo its blunt response to proposed laws that would add a fee for links to news stories. The company has typically responded to these rules by blocking news outlets in countries like Australia and Canada. “Look, we’re a big company,” he said. “We pay for content when it’s valuable to people. We’re just not going to pay for content when it’s not valuable to people. I think that you’ll probably see a similar dynamic with AI.”
We’ve known for some time that news isn’t particularly valuable to Meta, in part because moderation of it invites controversy and (according to Meta) it makes users feel bad. (“If we were actually just following what our community wants, we’d show even less than we’re showing,” Zuckerberg said in the interview.) The company’s generative AI products are still nascent, and it’s not clear anyone has figured out what people want from these tools. But whatever it is, most creators probably shouldn’t expect that it will get them paid.