Tech companies fight for their right to violate copyright

Open season on data scraping is, apparently, vital for the AI industry to survive.

Mar 17, 2025

OpenAI and Google submitted their thoughts on how the White House could give U.S. companies a leg up in the international AI industry, and one of the main points was that their ability to copy other people’s work should be codified in law.

In addition to expressing a general distaste for regulations, the companies’ formal suggestions for the U.S.’s “AI Action Plan” both said that being able to scrape copyrighted material was vital for AI development. OpenAI credited the “ability to learn from copyrighted material” for attracting so much AI investment to the U.S., while Google said fair use and text-and-data mining exceptions are “critical” to development.

Some background: Shortly after he was inaugurated, U.S. president Donald Trump scrapped a Biden-era executive order that aimed to create guardrails for safe and responsible AI development, replacing it with a call for an “action plan” that prioritized making U.S. businesses leaders in the sector.

Lawless: Most countries don’t have specific rules on what data AI can and cannot be trained on, so most parties have been applying their own interpretations of existing copyright laws (which developers say puts their data mining under fair use exceptions). There are also several civil cases that could end up setting precedent, if clearer laws don’t come.

Last month, Canadian AI developer Cohere was sued for copyright infringement by a group of news publishers that includes the Toronto Star, The Atlantic, Condé Nast, Forbes, The Guardian, Insider, and Vox Media.

What would happen: If the White House decides data mining is fair game, a lot of people are going to feel at a disadvantage. Artists, news outlets, and creatives will have less recourse if they feel their data has been used to build a machine that makes copies of their work. And AI developers outside of the U.S. might feel like they’re at a huge disadvantage because they are more limited in the data they can use.

There have been suggestions that AI advancement is stagnating because companies are running out of data to train models on, although some research shows pumping more data into AI doesn’t always make it work better.

International copyright enforcement is messy: In most cases — particularly criminal ones — the laws of the country where the infringement took place are what applies, so one would assume a company like OpenAI would be subject to U.S. laws. But courts can consider other jurisdictional factors, which is a reason why a lot in civil suits are filed in the country where the complainant feels the damages took place.

When a group of Canadian news outlets sued OpenAI for copyright infringement in November, they filed their suit in Ontario Superior Court. GEMA, a German music rights organisation, filed its suit against OpenAI in Munich Regional Court.

In Canada: In January, the federal government wrapped consultations on potential changes to the Copyright Act that could clarify the rules around data mining for AI training, and what those rules should be.

In its submission, Cohere mostly asked for more certainty around what kind of data scarping was permissible, but one of the clarifications it requested was that text and data mining does not infringe copyright.

Google claimed that AI training fell under fair dealing exceptions for research and private study, an argument some intellectual property experts have been skeptical about.