The legal battle over artificial intelligence training data intensified in September 2025 when authors filed a federal lawsuit against Apple, alleging the tech giant illegally used copyrighted books to train its AI systems. The proposed class action, filed in Northern California federal court by authors Grady Hendrix and Jennifer Roberson, claims Apple utilized a dataset containing pirated books without permission, consent, or compensation from the copyright holders.
The Apple lawsuit represents the latest chapter in expanding litigation challenging how tech companies acquire training data for AI development. The case specifically alleges that Apple used the Books3 dataset—described as containing nearly 200,000 copyrighted books—to train its OpenELM language models. This legal action follows similar suits against Microsoft, Meta, and OpenAI, highlighting the growing tension between AI innovation and intellectual property rights.
As AI companies face mounting lawsuits over their training practices, the fair use doctrine has emerged as their primary legal defense, fundamentally challenging traditional notions of how copyrighted works can be used without permission. The intersection of artificial intelligence and copyright law has become one of the most contentious battlegrounds in modern intellectual property disputes.
Understanding Fair Use in the AI Context
Fair use is a legal doctrine established in Section 107 of the U.S. Copyright Act that permits limited use of copyrighted material without authorization under specific circumstances. According to the U.S. Copyright Office, fair use analysis considers four statutory factors: the purpose and character of use, the nature of the copyrighted work, the amount used, and the effect on the market for the original work. AI companies argue that their training processes satisfy these criteria by creating transformative new works that don’t compete with or substitute for the original copyrighted materials.
The transformative use argument centers on how machine learning algorithms process datasets. AI developers contend that when algorithms process books, articles, and other creative works, they extract statistical patterns and relationships rather than copying content directly. This process, they argue, transforms the original works into trained models capable of generating novel outputs. Legal scholars note that transformative uses under copyright law are those that add new purpose or character rather than merely substituting for the original work.
Courts have recognized fair use for technologies that process large volumes of copyrighted content for new purposes – Google Books’ indexing for search, thumbnails for image search results, and reverse engineering for interoperability. These precedents suggest that automated, large-scale processing can qualify as transformative when serving distinct functions. Even substantial copying may be fair use when it’s an intermediate step toward a transformative end product. AI training involves temporary processing of content to extract patterns, similar to how search engines must copy web pages to index them, but the final model contains learned relationships rather than stored copies.
Recent Legal Developments
Recent court decisions have provided some clarity on AI training and fair use. In 2025, a federal district judge ruled that certain AI training practices could qualify for fair use protection, though each case depends on its specific facts and circumstances. These rulings have examined whether AI training processes create sufficiently transformative uses and whether they harm the market for original works.
The legal analysis has focused on whether AI training differs substantially from traditional copying because the final models generate new content based on learned patterns rather than reproducing original works directly. This distinction has become central to fair use arguments in AI cases, though courts continue to evaluate these claims on a case-by-case basis.
For example, in June 2025, two federal district judges in the Northern District of California delivered significant victories for AI companies. On June 23, Judge William Alsup ruled in Bartz v. Anthropic PBC that using copyrighted books to train AI models constitutes “fair use” under copyright law, describing the process as “exceedingly transformative.” CNBCNBC News
The court found that Anthropic’s AI models did not reproduce the original works’ creative elements for public consumption. However, Judge Alsup drew a critical line: while training on legitimately obtained books was fair use, Anthropic’s creation of a digital library using pirated copies was not protected and would proceed to trial.
Just two days later, Judge Vince Chhabria reached a similar conclusion in Kadrey v. Meta Platforms, ruling that Meta’s use of copyrighted books from shadow libraries to train its LLMs was also fair use. However, Judge Chhabria expressed more concern about potential market harm, noting that LLMs can rapidly create “literally millions of secondary works” that could lead to market dilution.
Legal experts note that these decisions turn heavily on specific facts, particularly whether AI systems can reproduce content substantially similar to the original works. Both California judges emphasized that their rulings might have been different if presented with evidence of content replication.
The courts have focused on several critical factors:
- Source of training data: Legitimately purchased books versus pirated copies can determine fair use
- Market substitution: Whether AI outputs directly compete with or substitute for original works
- Transformation level: How substantially the AI training process changes the purpose and character of the original works
Market Impact Analysis
The fourth factor of fair use analysis examines market harm to original works. The U.S. Copyright Office explains that this factor considers the effect of unauthorized use on the potential market for the copyrighted work. AI companies argue that their models serve different purposes than the original training materials, potentially reducing market harm concerns.
When users interact with AI systems like ChatGPT or Claude, they typically seek AI-generated responses, creative assistance, or analytical insights rather than access to specific copyrighted books or articles. The AI output, while informed by training data, represents a different product category that may not directly substitute for traditional publishing markets, though this remains a subject of ongoing legal debate.
Legal Precedent for Transformative Technologies
Courts have consistently grappled with how copyright law applies to emerging technologies, often recognizing that innovation may require access to existing copyrighted works to provide transformative services. This pattern has established important precedents that now inform AI copyright disputes.
The landmark Authors Guild v. Google case established crucial precedent for transformative use defenses in the digital age. When Google digitized millions of books to create a searchable database, the Second Circuit Court of Appeals ruled in 2015 that this mass digitization constituted fair use. The court found that Google’s use was transformative because it served fundamentally different purposes than the original books—enabling search functionality, providing limited previews, and facilitating academic research rather than replacing the reading experience.
This decision built on earlier cases involving technological innovation. The Supreme Court’s Sony v. Universal Studios(1984) recognized that new technologies with substantial non-infringing uses deserved protection, even if they could facilitate copyright infringement. Similarly, thumbnail image cases like Kelly v. Arriba Soft and Perfect 10 v. Amazonestablished that displaying reduced-size images for search purposes was transformative fair use.
Modern search engines operate under well-established fair use principles, regularly indexing and displaying brief excerpts of copyrighted content. Courts have consistently found that the transformative purpose of helping users locate information distinguishes these uses from the original works’ intended purposes. Search engines create value by organizing and providing access to information rather than substituting for the original content.
This principle extends beyond text to other media. Image search engines can display thumbnail versions of copyrighted photographs, and news aggregators can show headlines and brief excerpts, all under fair use protections. The key factor is that these uses serve different functions than the originals—discovery and navigation rather than consumption.
The fair use debate involves competing perspectives about innovation and creative rights. Technology advocates argue that restrictive copyright interpretations could limit AI development and reduce access to beneficial AI applications. They reference historical patterns where new technologies initially faced copyright challenges before courts recognized legitimate fair use applications.
Authors’ organizations, publishers, and other content creators maintain that AI training represents large-scale commercial exploitation of intellectual property without proper authorization or compensation. They argue that the scale and commercial nature of AI training distinguishes it from earlier fair use cases.
Rights holders contend that AI companies should negotiate licensing agreements for copyrighted works used in training, regardless of claimed transformative purposes. They point out that many AI systems are developed by highly profitable corporations that can afford to compensate creators, and that fair use should not excuse commercial exploitation simply because new technology is involved.
Publishers also express concern about market substitution, arguing that AI systems trained on their content could eventually compete with or replace demand for original works. They worry that sophisticated AI models might generate content similar enough to original works to harm creators’ markets without providing any direct compensation.
Legal Framework and Analysis
The four-factor fair use test provides courts with analytical tools for evaluating AI training practices. Factor one examines purpose and character, including commercial versus noncommercial use and whether the use is transformative. Factor two considers the nature of copyrighted works used. Factor three analyzes the amount and substantiality of portions used. Factor four evaluates market harm potential.
Courts must weigh these factors together rather than relying on any single factor. The Copyright Alliance notes that fair use determinations are made case-by-case, with no bright-line rules for automatic application. This analytical framework allows courts to consider both technological innovation and creators’ rights in developing AI-specific precedent.
Future Legal Development
As courts continue addressing these issues, the fair use doctrine’s flexibility enables balancing competing interests in intellectual property law. The traditional four-factor analysis provides structure for evaluating AI training practices while considering both innovation benefits and creator protections. Future court decisions will likely establish clearer boundaries for AI training practices and industry guidelines.
The ongoing legal proceedings surrounding AI copyright and fair use will influence how artificial intelligence develops and integrates into society. These cases are establishing precedents that will affect current AI companies and future transformative technologies. The outcomes may determine whether AI development continues at current rates or faces additional legal constraints that could alter the industry’s trajectory.
Conclusion
The intersection of AI technology and copyright law presents complex questions that courts are actively resolving through fair use analysis. While the legal landscape continues evolving, the established four-factor framework provides structure for evaluating these novel challenges. As litigation proceeds, clearer guidelines will emerge for balancing technological innovation with intellectual property rights in the AI era.
This analysis is based on publicly available legal information and court decisions. For specific legal advice regarding copyright and fair use issues, consult qualified legal counsel.
[/et_pb_text][/et_pb_column][/et_pb_row][/et_pb_section]
