OpenAI’s Smackdown by a German Court Hints at What’s Next for AI and Art

Late last week, a German musicians’ organization scored a pretty crushing legal victory against OpenAI. The court says the training of the GPT 4 and 4o models included copyright infringement, and that some outputs of the models are themselves infringement. A pretty comprehensive win for the “it’s just a plagiarism machine” crowd.

Seasoned OpenAI haters will agree, I think, with at least some of the recent legal analysis of the ruling by intellectual property law scholar Andres Guadamuz of the University of Sussex. Guadamuz points out that the decision and its implications are a bit messy, but may truly benefit copyright holders in the long term.

That likely means copyright big fish—pop stars, Hollywood actors, and bestselling authors—should now be getting a sense of how this technology might benefit them monetarily, even if small-time creators might not be so lucky.

The context: GEMA is a German organization with no American equivalent, a copyright collective representing the interests of composers, lyricists, and publishers. It sued OpenAI on behalf of stakeholders related to nine famous and uncontroversial German songs. This would be like suing on behalf of the composers and lyricists of nine American songs that run the gamut from “Soak Up the Sun” by Sheryl Crow to “Happy” by Pharrell Williams.

In other words, these aren’t lyrics that OpenAI dug up once from a garage band’s website and turned into training data. Instead, they are inescapable cultural touchstones that would have appeared in training data again and again in multiple, potentially altered, or parodied forms, and as fragments, excerpts, and snippets.

The basis of the suit was that after turning off ChatGPT’s ability to browse the web, users were able to feed it queries like “What is the second verse of [the German equivalent of “No Scrubs” by TLC]?” And ChatGPT would reply with a sometimes fragmented or flawed, but largely correct answer.

The ruling is from the Munich Regional Court, and naturally it’s in German, but a Google Translated version gave me the following broad-strokes interpretation of what the court determined:

The model itself stored illegal reproductions of the lyrics to those songs. When it regurgitated the lyrics in response to prompts, even if it was producing the lyrics in incomplete form, or hallucinating wrong lyrics, that was a further act of infringement. Importantly, some hypothetical ChatGPT user attempting to get lyrics from ChatGPT is not the copyright infringer; OpenAI is. And because ChatGPT outputs have shareable links, OpenAI was making this infringing material available to the public without permission.

OpenAI must at some point now disclose how often the texts of these lyrics were used as training data, and when, if ever, it made money from them. It also has to stop storing them, and must not output them again. Monetary damages may be determined at some point later.

Earlier this month, a somewhat similar court case in the UK went precisely the other way: Getty Images lost its case against Stability AI, because, the judge in that case wrote, “An AI model such as Stable Diffusion which does not store or reproduce any copyright works (and has never done so) is not an ‘infringing copy’.”

Guadamuz’s analysis is interesting on this point, because it gets at what the court was thinking here. The German court, Guadamuz notes, relied on research about machine “memorization,” something a model can more easily and obviously do with lyrics than with, say, a Getty Images photo it was trained on.

So in contrast to the Getty ruling, this new ruling is consistent with a lot of the existing intellectual property legal thought in the digital era—that the same copyright rules apply to, say, a playable CD and a CD-Rom.

So as long as the copyrighted material can be made perceptible again, it’s a monetizable copy of the artwork. That’s also the case with lyrics “contained” within an LLM.

Guadamuz takes issue, however, with how the ruling further treats this “memorization” concept, seemingly attempting to make training without memorization the legal norm by using an EU data-mining law. In a local sense, Guadamuz finds this to be a problem because it assumes a condition that doesn’t match what the law says. But more importantly, it seems to suggest that memorization always occurs when training on a given work, which Guadamuz says isn’t the case.

That legal sloppiness could be a problem as companies interpret this case in the coming years, but the takeaway for Guadamuz is this: we will most likely “eventually end up with some form of licensing market.”

Like with Sora 2’s treatment of copyright and likeness, which many actors and copyright holders eventually approved of, a framework is slowly materializing aimed at sharing revenue (theoretical, future AI revenue) with the owners of copyrighted texts. OpenAI shocked all the world’s copyright holders by creating a whole new universe of perceived copyright infringement. Artists and creators understandably felt robbed.

But slowly, powerful stakeholders are warming up to the idea of generative AI, because they’re starting to envision how they’ll get their beaks wet, and just how wet their beaks might eventually be. You can see this with major U.S. record labels now teaming up with companies they had once sued, like Udio.

But as for the dry, chapped beaks of powerless copyright stakeholders—small-time artists, writers, and creators—concerned that their work will simply be made redundant or irrelevant in this weird new content universe, it’s still not at all clear how those beaks benefit from any of this.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top