Generative artificial intelligence models such as OpenAI’s ChatGPT are trained by being fed giant amounts of data, but what happens when this data is copyrighted?
Well, the defendants in a variety of lawsuits currently making their way through the courts claim that the process infringes upon their copyright protections.
For example, on Feb. 3, stock photo provider Getty Images sued artificial intelligence firm Stability AI, alleging that it copied over 12 million photos from its collections as part of an effort to build a competing business. It notes in the filing:
“On the back of intellectual property owned by Getty Images and other copyright holders, Stability AI has created an image-generating model called Stable Diffusion that uses artificial intelligence to deliver computer-synthesized images in response to text prompts.”
While the European Commission and other regions are scrambling to develop regulations to keep up with the rapid development of AI, the question of whether training AI models using copyrighted works classifies as an infringement may be decided in court cases such as this one.
The question is a hot topic, and in a May 16 Senate Judiciary Committee hearing, United States Senator Marsha Blackburn grilled OpenAI CEO Sam Altman about the issue.
While Altman noted that “creators deserve control over how their creations are used,” he refrained from committing not to train ChatGPT to use copyrighted works without consent, instead suggesting that his firm was working with creators to ensure they are compensated in some way.
AI companies argue “transformative use”
AI companies generally argue that their models do not infringe on copyright laws because they transform the original work, therefore qualifying as fair use — at least under U.S. laws.
“Fair use” is a doctrine in the U.S. that allows for limited use of copyrighted data without the need to acquire permission from the copyright holder.
Some of the key factors considered when determining whether the use of copyrighted material classifies as fair use include the purpose of the use — particularly, whether it’s being used for commercial gain — and whether it threatens the livelihood of the original creator by competing with their works.
The Supreme Court’s Warhol opinion
On May 18, the Supreme Court of the United States, considering these factors, issued an opinion that may play a significant role in the future of generative AI.
The ruling in Andy Warhol Foundation for the Visual Arts v. Goldsmith found that famous artist Andy Warhol’s 1984 work “Orange Prince” infringed on the rights of rock photographer Lynn Goldsmith, as the work was intended to be used commercially and, therefore, could not be covered by the fair use exemption.
While the ruling doesn’t change copyright law, it does clarify how transformative use is defined.
Mitch Glazier, chairman and CEO of the Recording Industry Association of America — a music advocacy organization — was thankful for the decision, noting that “claims of ‘transformative use’ cannot undermine the basic rights given to all creators under the Copyright Act.”
"We applaud the Supreme Court’s considered and thoughtful decision that claims of “transformative use” cannot undermine the basic rights given to all creators under the Copyright Act,” says RIAA Chairman & CEO @mitch_glazier. https://t.co/C5iTLr4Mk4 pic.twitter.com/KMHyyXZTA3
— RIAA (@RIAA) May 18, 2023
Given that many AI companies are selling access to their AI models after training them using creators’ works, the argument that they are transforming the original works and therefore qualify for the fair use exemption may have been rendered ineffective by the decision.
It is worth noting that there is no clear consensus, however.
In a May 23 article, Jon Baumgarten — a former general counsel at the U.S. Copyright Office who participated in the formation of the Copyright Act — said the case highlights that the question of fair use depends on many factors and argued that the current general counsel’s blanket assertion that generative AI is fair use “is over-generalized, oversimplified and unduly conclusory.”
A safer path?
The legal question marks surrounding generative AI models trained using copyrighted works have prompted some firms to heavily restrict the data going into their models.
For example, on May 23, software firm Adobe announced the launch of a generative AI model called Generative Fill, which allows Photoshop users to “create extraordinary imagery from a simple text prompt.”
While the product is similar to Stability AI’s Stable Diffusion, the AI model powering Generative Fill is trained using only stock photos from its own database, which — according to Adobe — helps ensure it “won’t generate content based on other people’s work, brands, or intellectual property.”
Related: Microsoft urges lawmakers, companies to ‘step up’ with AI guardrails
This may be the safer path from a legal perspective, but AI models are only as good as the data fed into them, so ChatGPT and other popular AI tools would not be as accurate or useful as they are today if they had not scraped vast amounts of data from the web.
So, while creators might be emboldened by the recent Warhol decision — and there is no question that their works should be protected by copyright law — it is worth considering what its broader effect might be.
If generative AI models can only be trained using copyright-free data, what kind of effect will that have on innovation and productivity growth?
After all, productivity growth is considered by many to be the single most significant contributor to raising the standard of living for a country’s citizens, as highlighted in a famous quote from prominent economist Paul Krugman in his 1994 book The Age of Diminished Expectations:
“Productivity isn't everything, but in the long run it is almost everything. A country’s ability to improve its standard of living over time depends almost entirely on its ability to raise its output per worker.”
Magazine: Crypto City: Guide to Osaka, Japan’s second-biggest city