
TRAINING THE MACHINE: FAIR USE OR COPYRIGHT INFRINGEMENT IN GENERATIVE AI
1
7
0
Author- Aniya Vijayvergiya

ABSTRACT
Legal conflicts over the use of copyrighted works in AI training have escalated due to the rapid development of generative artificial intelligence. This article investigates whether widespread scraping of content protected by copyright qualifies as infringement or fair usage. It emphasizes the conflict between technological innovation and creators' rights by examining court rulings, academic commentary, and policy viewpoints. It contends that unclear legislative guidance has led to legal ambiguity, highlighting the necessity of a well-balanced regulatory framework.
INTRODUCTION
The swift advancement of generative artificial intelligence has revolutionized the production and distribution of knowledge, posing intricate problems at the nexus of copyright law and technology. Modern AI models are trained on large, copyright-protected datasets of text and artistic creations. Developers, who see such training as transformational and legal, and creators, who claim unapproved exploitation, are engaged in a global dispute over this practice. Whether extensive scraping for AI training constitutes fair usage or copyright violation is at the heart of this controversy. This article evaluates the effectiveness of current copyright regimes by looking at policy methods, intellectual debate, and judicial reasoning
AI TRAINING AS FAIR USE: THE TRANSFORMATIVE USE ARGUMENT
According to proponents of generative AI, the idea of transformative use permits fair use when training AI models on protected text and artistic creations. The purpose and nature of the use, the nature of the original work, the quantity used, and the impact on the potential market are all considered when evaluating fair use under U.S. copyright law. Since AI training does not aim to replicate or profit from the expressive content of the original works, courts and supportive academics contend that it serves a fundamentally different purpose.
According to this viewpoint, AI models examine massive datasets to find statistical trends, linguistic connections, and stylistic elements rather than "consuming" copyrighted works in the conventional sense. Academics frequently compare digitisation cases like Authors Guild v. Google and Authors Guild v. HathiTrust, in which widespread copying was allowed since it allowed for new technological features like search and analysis without replacing the original works. In a similar vein, recent court rulings involving AI developers, such as Anthropic and Meta, have acknowledged that training procedures may be revolutionary when they allow machines to acquire generic representations instead of reproducing protected material
Additionally, proponents contend that since copyrighted works are not made publicly available throughout the training process, AI training does not intrinsically harm the market. Therefore, they propose that evaluations of violations should concentrate on outputs produced by AI rather than internal training methods. The idea of acknowledging AI training as fair use, which offers a practical balance between intellectual property protection and technological growth, is reinforced by policy evaluations, such as those conducted by the OECD, which warn that unduly stringent copyright interpretations may impede innovation.
CONCERNS OF INFRINGEMENT: MARKET HARM, CONSENT & SCALE
The idea that AI training should automatically be considered fair use is contested by a sizable body of research, despite the transformative use argument's persuasive power. Critics contend that generative AI differs from previous digitisation initiatives due to the scope and commercial nature of data harvesting. Because generative AI systems can develop outputs that closely resemble existing works, unlike search engines or digital archives, there are substantial worries about market replacement and economic harm.
The additional use factors give rise to a significant issue. Novels, artwork, and pictures are examples of highly creative works that make up the majority of AI training datasets.
The training procedure entails reproducing complete works rather than just specific passages. Academics argue that a judgment of fair usage is strongly undermined by the industrial-scale duplication of entire works by profit-driven organisations. Furthermore, models may internalise stylistic and expressive components essential to a work's originality, refuting the notion that AI training includes just "non-expressive" use.
The most contentious topic is still market harm. In writing, design, and the visual arts, critics note that AI-generated outputs are becoming more and more competitive with human creators, compromising both current and future markets for unique works. Legal and ethical issues are made worse by the lack of consent, attribution, or payment. This mistrust is reflected in the preference of European academics and decision-makers for legislative language and data-mining exceptions with opt-out procedures over broad judicial interpretations of fair use. All of these worries point to the possibility of diminishing copyright protection and undermining incentives for creative effort if fair use is expanded too widely in the context of generative AI.
CONCLUSION
It's still controversial whether use of copyrighted material to train AI algorithms is legal. The transformative nature of AI training has been acknowledged by courts in a few instances, although this recognition is highly fact-specific and inconsistent. Traditional copyright concepts continue to be challenged by the scope, commercial intent, and potential market impact of generative AI. A balanced legal framework is necessary to support innovation while defending the rights of artists in the absence of explicit statutory guidance.
REFERENCES
https://www.law.cornell.edu/uscode/text/17/107 - Fair Use Doctrine (Statutory basis)
https://harvardlawreview.org/print/vol-103/toward-a-fair-use-standard/ - Transformative Use Theory.
https://law.justia.com/cases/federal/appellate-courts/ca2/13-4829/13-4829-2015-10-16.html - Authors Guild v. Google.
https://law.justia.com/cases/federal/appellate-courts/ca2/12-4547/12-4547-2014-06-10.html - Authors Guild v. HathiTrust.
https://www.oecd.org/innovation/intellectual-property/issues-in-ai-trained-on-scraped-data.htm - OECD, Intellectual Property Issues in Artificial Intelligence Trained on Scraped Data (2023).
https://texaslawreview.org/fair-learning/ - Market Harm & AI Learning.





