Microsoft Faces Lawsuit Over Alleged Copyright Infringement in AI Training
Microsoft is facing a new legal challenge as a group of prominent authors has filed a lawsuit in New York federal court, accusing the tech giant of using pirated versions of their books to train its Megatron AI model without permission. The plaintiffs include Kai Bird, Jia Tolentino, Daniel Okrent, and others, who claim Microsoft violated U.S. copyright law by allegedly using a dataset containing nearly 200,000 pirated books to train its AI system.
Authors Say Megatron AI Mimics Their Copyrighted Work
The complaint alleges that Microsoft built a text-generating algorithm capable of producing content that mimics the style, voice, and themes of the copyrighted works it was trained on. According to the lawsuit, the Megatron model was trained to respond to human prompts using expressive language derived from authors’ stolen content. “Microsoft used the pirated dataset to create a computer model that is not only built on the work of thousands of creators and authors but also built to generate a wide range of expression that mimics the syntax, voice, and themes of the copyrighted works,” the authors claim.
Legal Backdrop: Wave of Lawsuits Against AI Firms
This lawsuit is part of a growing wave of litigation targeting tech companies over the use of copyrighted content in generative AI training. Microsoft joins a list that includes Meta, Anthropic, and Microsoft-backed OpenAI, all of whom have been accused of using proprietary works without authorisation to develop artificial intelligence systems.
The filing comes just one day after a California federal judge ruled that Anthropic’s use of copyrighted materials could qualify as fair use, though the company might still be liable for piracy. That decision marked the first major ruling in the U.S. concerning generative AI’s reliance on copyrighted data.
Microsoft Remains Silent Amid Rising Tensions
As of Wednesday, Microsoft has not commented on the lawsuit. An attorney representing the authors also declined to provide a statement. The case raises urgent questions about AI ethics, copyright law, and the future of AI model training.
What the Authors Are Demanding
The authors are seeking:
- A court injunction to block Microsoft from using their works in AI training
- Statutory damages of up to $150,000 per infringed work
Their argument centres on the idea that their books were used without license or compensation—a move they say could severely undercut the financial stability of professional writers and creators.
Tech Industry Defends AI Training Practices
In response to broader lawsuits, tech companies have maintained that their AI models make “fair use” of publicly available content. They argue that forcing payment for training data could stifle innovation in the emerging AI sector.
However, critics and rights holders contend that transformative use claims don’t justify training models on stolen or copyrighted material, particularly when those models produce content that closely resembles the original works.
High-Stakes Legal Battle Could Shape AI’s Future
The outcome of this case could set a precedent for how generative AI systems are trained going forward. As courts begin to weigh in on copyright protections in the age of AI, the Microsoft lawsuit may become a landmark moment in the clash between creators’ rights and technological progress.