OpenAI is under intensifying scrutiny for its practice of scraping vast amounts of copyrighted online content to train its popular AI systems like ChatGPT and DALL-E. The New York Times (NYSE:NYT) thrust the issue into the spotlight by filing a lawsuit against OpenAI and Microsoft last month, alleging copyright infringement for OpenAI’s use of Times articles.
OpenAI responded by claiming its use of public data is protected fair use and that the Times misrepresented the potential for ChatGPT to regurgitate full articles. But the Times maintained that OpenAI competes directly with its journalism using copied content. This back-and-forth comes as AI copyright disputes multiply, with groups of authors, programmers, and stock photo agencies also suing OpenAI and rivals.
The Argument
At the core of the mounting tensions is disagreement around whether scraping copyrighted works to build AI models constitutes transformative fair use. OpenAI argues AI needs broad access to “human knowledge” to function properly, and no model can be built without incorporating some copyrighted data.
While OpenAI has licensing deals with some publishers like Axel Springer and The Associated Press, payments remain modest. Estimates suggest OpenAI pays news outlets only $1-5 million annually despite over $1.6 billion in revenue. Talks with the Times exploring a real-time content usage partnership broke down right before the lawsuit.
This speaks to a broader power imbalance. Generative AI stands to disrupt multiple creative industries built on copyright protection, from journalism to photography. Yet compensation for the use of these works remains minimal compared to the soaring value of AI startups like OpenAI.
OpenAI’s astronomical rise relied heavily on expansive access to public training data, enabling systems like DALL-E to thrive. But this high-profile Times lawsuit stresses rising calls for compensation around using copyrighted materials for AI. As innovations continue apace, tensions over rights protections highlight the need to balance progress ethically.
As Backlash Grows, Legal Clarity Lags
The wave of lawsuits highlights the legal gray zone around applying copyright law to rapidly evolving AI systems consuming vast creative works. While outrage grows among creators, judges have yet to make definitive rulings on whether current law accommodates these technological shifts.
This uncertainty may fuel more costly lawsuits absent legislative action. In the meantime, public opinion appears to be turning against AI companies training models on copyrighted data without clear licensing agreements. But legal experts say current fair use definitions likely allow most non-commercial personal uses of AI systems, even if they indirectly utilize scraped copyrighted source material.
Still, pressure on OpenAI to tighten up practices will likely persist from creators seeking fair compensation if their works power profitable AI apps. Striking the right balance between AI innovation and creative rights remains a complex challenge. Touching on issues of ethics, economics, and regulation around emerging technology. The stakes are high on all sides, ensuring conflicts around the commercial use of copyrighted source material persist as generative AI’s reach expands across industries.
Go Deeper –> OpenAI claims The New York Times tricked ChatGPT into copying its articles – The Verge
OpenAI claims New York Times copyright lawsuit is without merit – TechCrunch
OpenAI responds to New York Times lawsuit, says ‘regurgitation’ of content is a ‘rare bug’ – CNBC