Meta Wins Case Over Its Use of Copyright-Protected Content to Train AI

One of the most significant (yet less flashy) considerations of the new wave of generative AI tools is the copyright implications of such, both in terms of usage (can you own the rights to an AI-generated work?) and generation (are AI projects stealing artists’ work?).

And both, at least at present, fall into somewhat awkward legal territory, because copyright laws, as they exist, haven’t been designed to cater to AI content. Which means that, technically, it remains difficult to prosecute, on either front.

Today, Meta has had a big court win on this front, with a federal judge ruling that Meta did not violate copyright law in training its AI models on original works.

Back in 2023, a group of authors, including high-profile comedian Sarah Silverman, launched legal action against both Meta and OpenAI over the use of their copyrighted works to train their respective AI systems. The authors were able to show that these AI models were capable of reproducing their work in highly accurate form, which they claim demonstrates that both Meta and OpenAI used their legally protected material without consent. The lawsuit also alleges that both Meta and OpenAI removed the copyright information from their books to hide this infringement.

In his assessment, Judge Vince Chhabria ruled that Meta’s use of these works was considered “transformative,” in that the purpose of Meta’s process is not to re-create competing works, necessarily, but to facilitate all new uses of their language.

As per the judgment:

“The purpose of Meta’s copying was to train its LLMs, which are innovative tools that can be used to generate diverse text and perform a wide range of functions. Users can ask Llama to edit an email they have written, translate an excerpt from or into a foreign language, write a skit based on a hypothetical scenario, or do any number of other tasks. The purpose of the plaintiffs’ books, by contrast, is to be read for entertainment or education.”

As such, the judge ruled that because the re-use of the works was not intended to create a competing market for these works, the application of “fair use” in this case applies.

But there are a lot of provisos in the ruling.

First, the judge notes that the case “presented no meaningful evidence on market dilution at all,” and without that element spelled out in the arguments, Meta’s defense that it can use these works under fair use is applicable.

Just judge also notes that:

“In cases involving uses like Meta’s, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant’s use. No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books. And some cases might present even stronger arguments against fair use.”

So essentially, the judge is saying that while the intention of use in this case is not to facilitate the creation of competing works, thereby harming the copyright holders and their capacity to generate income from their work, it’s inarguable that AI models will facilitate such. But in this instance, the case against Meta did not state this element clearly enough to find in the plaintiffs’ favor.

So while it may seem like a blow for artists, enabling generative AI projects to essentially steal their work for their own purpose, the judge is really saying that there is likely a legal case that would apply, and would potentially enable artists to argue that such use is in violation of copyright. But this particular case hasn’t made it.

Though that’s still not great for artists seeking legal protection against generative AI projects, and unlicensed usage of their work.

Just last week, a federal judge ruled in favor of Anthropic in a similar case, which essentially enables the company to continue training its models on copyright-protected content.

The sticking point here is the argument of “far use,” and what constitutes “fair” in the context of re-use for alternative purpose. Fair use law is generally designed to apply to journalists and academics, in reporting on material that serves an educational purpose, even if the copyright holder may disagree with that usage.

Do LLMs, and AI projects, fall into that same category? Well, under the legal definition, yes, because the intent is not to re-create such work, but to facilitate new usage based on elements of it.

I guess, in that sense, an individual artist may be able to win a case where an AI work has clearly replicated theirs, though that replication would have to be indisputably clear, and there would also, presumably, have to be a level of benefit gleaned by the AI creator to justify such.

And also, people can’t copyright AI-generated works, so that’s another wrinkle in the AI legality puzzle.

There’s also a whole other element in both of these cases which relates to how both Meta and Anthropic accessed these copyright-protected materials in the first place, amid claims that these have been stolen off dark web databases for mass-training. None of those claims have been proven as yet, though that’s a separate factor which relates to a different type of content theft.

So where do we stand on legal use of generative AI content?

Yeah, it’s pretty unclear, and the judge in this case is saying that there may be a different legal argument that could win in such a case.

But this isn’t it, and because the laws haven’t been designed with AI in mind, what exactly the legal case needs to be is not entirely clear. But we haven’t established a precedent to stop AI training on copyright-protected works as yet.

Source link