AI and Copyright Law: Understanding the Challenges

July 12, 2023

Artificial intelligence (AI) and copyright law appear to be on a collision course, with court cases and both executive and legislative activity poised to shape how copyright law will apply to AI development and uses. Recent developments provide some insights that can help guide companies as they create internal policies to deal with copyright issues that may arise in the use of AI.

Recent Court Developments.

Recent court cases could have important implications for application of the copyright fair use defense in the context of developing generative AI tools.

On May 18, 2023, the U.S. Supreme Court issued its decision in the closely watched case Andy Warhol v. Goldsmith. As discussed in a recent client alert, the Court found that “if an original work and a secondary use share the same or highly similar purposes, and the secondary use is of a commercial nature, the first factor is likely to weigh against fair use, absent some other justification for copying.” At a high level, the decision continues the Court’s trend of eschewing bright line rules in the fair use context and favoring a more context-specific approach, but it appears to move away from the notion articulated by lower courts that transformative use alone can be dispositive in analyzing fair use.

Additionally, a group of plaintiffs recently filed a class action suit against OpenAI, the operator of ChatGPT, alleging that OpenAI committed copyright infringement and violated the Digital Millennium Copyright Act (DMCA) by scraping textual material from the internet to train OpenAI’s Language Models.

This class action adds to several recent lawsuits raising similar issues surrounding generative AI’s use of copyright-protected material. In February, a lawsuit was filed by Getty Images against Stability AI, a generative AI start-up, claiming that Stability AI used copyrighted photos to train its image-generating tool. In January, a group of artists also filed a class action suit against Stability AI based on similar allegations.

In November, another class action suit was filed against OpenAI, GitHub, and GitHub’s parent company, Microsoft, alleging copyright infringement based on the scraping of licensed code to train and create GitHub’s AI-powered Copilot software, a tool that offers autocomplete-type recommendations as a user writes software code. Copilot is alleged to have ignored copyright notices and license restrictions on the use of code, and reproduced the code in training the model while omitting attribution, copyright notices, and license requirements in violation of open-source licenses attached to the code. OpenAI, GitHub, and Microsoft contend that Copilot does not actually copy anything, but instead generates suggestions afresh based on what it has discerned from training materials. Thus, defendants argue that even if the code may have the same function as plaintiffs’ code, it has a different expression of that code and does not give rise to liability under the Copyright Act.

In all these lawsuits, courts will likely be tested on the novel issues surrounding how the fair use defense applies in the context of using copyrighted materials when creating generative AI tools. In doing so, they will apply the Supreme Court’s recent holding in Warhol when analyzing the fair use defense.

Recent Initiatives and Guidance from the Executive and Legislative Branches.

The Copyright Office has spent much of 2023 trying to wrap its arms around how to address AI-generated content. In February, the Office partially canceled the registration for a work containing human-created text but AI-generated images, finding that while the text was subject to registration, the images were not. The Office is also facing a lawsuit over its denial of a registration for a visual work created by an AI tool. In March, the Office formally launched an initiative to examine copyright law and policy issues, which kicked off with community outreach events to obtain shareholder input. Based on that input, the Office issued new registration guidance specifically focusing on AI-generated works. The guidance explains how applicants should disclose the inclusion of AI-generated content in works submitted for registration, including how to update or correct previously registered copyright claims that omitted this information.

Congress has similarly expressed an interest in clarifying whether AI-generated works can receive copyright protection. In May 2023, the Congressional Research Service (CRS) published a report discussing the Copyright Office’s policies that arguably make it difficult to obtain copyright protections for AI-generated content, and analyzing whether the AI training process can infringe others’ copyrights. CRS published a subsequent report providing a general primer on generative AI and data privacy.

Just last month, the General Services Administration (GSA) issued an interim security policy that directed employees and contractors to limit their use of AI tools on GSA networks and government furnished equipment. This rule was motivated, at least in part, by security concerns over “federal data [] be[ing] leaked to a platform that is not authorized to house or transact it” and the publication of sensitive data that is used to train the models. Other agencies, including the Environmental Protection Agency and the Administration for Children and Families Division of the Department of Health and Human Services, have adopted similar rules in recent months.

How This Will Impact Companies.

Issues that have arisen in the above-referenced court cases and agency initiatives are not merely theoretical; businesses that create generative AI tools run the risk of allegations of infringing copyrights and other forms of intellectual property. To create a generative AI tool, developers must train the system using large amounts of data so that it can provide the proper outputs for a given application. In many instances, datasets are built by scraping publicly available material found on the internet without permission from copyright owners. Even when scraped data is publicly available to use, businesses may find themselves facing challenges if the method for acquiring the data or the way a business uses the data is contrary to the terms of use. As demonstrated in ongoing litigation, developers may be alleged to infringe on copyrights or other intellectual property by using unlicensed data to train AI systems. But it is not clear that this use of third-party materials to train AI systems constitutes infringement, as courts have not yet addressed whether these programs are actually violating the exclusive rights conferred under the Copyright Act, or merely producing similar code based on the AI’s lawful use of third-party materials for training.

Government Contracting and Procurement

For businesses that contract with federal, state, or local government agencies, other issues may arise as the use of AI becomes more ubiquitous. While there is some generalized information technology (IT) guidance in the Federal Acquisition Regulation (FAR) that applies broadly to the acquisition of AI tools, the FAR does not address many of the AI-specific issues that businesses and government agencies will encounter in the future.

In the longer term, the government will likely impose limits on the procurement of AI to address national security risks. Much like the way that third party code creates vulnerability concerns, intentionally or accidentally biasing training data would introduce vulnerabilities into AI systems. The provenance of the training data used for federal AI applications will likely become heavily scrutinized as a result.

With these issues in mind, there are numerous best practices that companies can incorporate into their internal policies to mitigate risk and minimize legal exposure including:

Reviewing how AI products that are either used or produced by a company are trained, including whether data sources are licensed, and evaluating their own risk related to copyright issues.
Designing policies for employee use of generative AI tools, to ensure that companies have a broad view and exercise oversight over institutional risks, including copyright-related risks.
Regularly reviewing legal developments to ensure that company AI use complies with all federal, state, and local requirements.

***

Wiley’s cross-functional AI advisory team is helping clients prepare for the future, drawing on decades of work in emerging tech, cybersecurity, privacy, consumer protection, and federal procurement. All of these areas are intersecting to shape the future of AI. Please reach out to the authors with any questions.