Canada Rules OpenAI Violated Privacy Laws in ChatGPT Training Data Collection

In a landmark ruling that has sent ripples through the global AI industry, Canadian privacy regulators formally concluded on May 6, 2026 that OpenAI violated federal and provincial privacy laws when it collected personal data to train ChatGPT — marking the first time a national privacy authority has ruled that AI training data collection constitutes a privacy violation.

The joint investigation, led by the Office of the Privacy Commissioner of Canada (OPC) and provincial counterparts in Quebec, British Columbia, and Alberta, found that OpenAI's practices in developing its GPT-3.5 and GPT-4 models were "overbroad" and non-compliant across multiple dimensions: the company scraped vast quantities of personal information from the public internet without obtaining meaningful consent, failed to apply proportionality in what it collected, and did not put adequate safeguards in place to prevent sensitive data — including health information, political opinions, and children's personal details — from being swept into its training pipeline.

Four Regulators, One Damning Conclusion

The investigation, which was launched in 2023 following complaints from Canadian users and advocacy groups, examined how OpenAI sourced the data underlying its flagship chatbot. Investigators scrutinized three main data streams: publicly scraped web content, licensed third-party datasets, and direct user interactions with ChatGPT.

On all three fronts, regulators found fault. "The manner in which OpenAI initially collected personal information from publicly accessible websites and licensed third-party sources to train the models was overbroad and therefore inappropriate," the OPC stated in its published findings. The federal commissioner and provincial regulators in BC and Alberta further concluded that OpenAI should have obtained express consent for the practice — a bar that regulators say the company simply never cleared.

The findings also cited a failure to build privacy-by-design principles into the development of ChatGPT from the outset. Rather than embedding privacy governance before launching, OpenAI proceeded without a mature framework, resulting in inadequate oversight over what personal data entered its training corpus and how long it was retained.

A Split Decision With Consequences

While the headline finding was unanimous, the four regulators parted ways on enforcement. The federal OPC resolved its portion of the investigation conditionally — accepting OpenAI's cooperation and a package of corrective commitments — but three provincial authorities are continuing active enforcement proceedings.

British Columbia and Alberta proved the most uncompromising. Their commissioners declined to close the file, taking the position that consent for already-scraped data simply cannot be obtained retroactively. That distinction matters: it signals that any remediation OpenAI offers going forward cannot undo the original violation in the eyes of those provinces. Quebec's regulator has similarly kept its enforcement track open.

OpenAI, for its part, cooperated throughout the investigation and has agreed to a remediation timeline of three to six months, with quarterly progress reports due to regulators. The company has already taken some steps: retiring earlier ChatGPT models trained under the contested practices, introducing filters to identify and mask personal and sensitive data in new training sources, and committing to clearer notices about training practices. OpenAI has also significantly limited the use of personal and sensitive information in training its newest ChatGPT models.

Why This Ruling Is Different

Privacy complaints against AI companies are not new. But Canada's investigation stands apart for one critical reason: it is the first completed formal privacy investigation by a national authority to conclude, with binding regulatory findings, that the act of collecting data for AI model training — not just its downstream use — constitutes a privacy violation.

That framing matters enormously. Until now, much of the debate around AI training data and privacy has played out in civil litigation and legislative negotiation. Regulators in the European Union, the United States, and the United Kingdom have opened proceedings or issued guidance, but none had reached a final, published determination on the training data question. Canada now has.

The ITIF (Information Technology and Innovation Foundation) pushed back on the ruling in a May 12 analysis, arguing that it "sets a bad precedent" by treating publicly accessible information as private, potentially chilling AI development. But privacy advocates counter that the public availability of data has never been the legal test under PIPEDA or Quebec's Law 25 — the question is always whether individuals would have reasonably expected their information to be used in this way. On that question, Canadian regulators answered clearly: they would not have.

The Global Ripple Effect

Every major AI developer that trained models on Common Crawl, Reddit archives, social media posts, or other large web scrapes is watching closely. The legal logic applied in the Canadian findings — that scale, sensitivity, and lack of consent make scraping non-compliant regardless of the data's public status — is directly portable to how the European GDPR and other frameworks treat "legitimate interest" arguments for training data.

The OPC's published backgrounder notes that "individuals would not have reasonably expected their public data to be used to train AI systems" — a formulation that undermines the industry-standard defense that publicly available data is fair game. Every AI company that trained on Common Crawl data, Reddit posts, social media profiles, or public forums just received a direct regulatory signal about where the line is drawn.

For OpenAI specifically, the ruling adds regulatory pressure in a jurisdiction that is neither a fringe market nor a regulatory outlier. Canada is a close trading partner of the United States, a member of the Five Eyes intelligence alliance, and a country whose privacy law (PIPEDA) is widely regarded as broadly equivalent to GDPR adequacy standards. A finding of non-compliance here carries credibility in Brussels and London in ways that a smaller jurisdiction's ruling might not.

What Comes Next

OpenAI's quarterly reporting obligations will keep the company under Canadian regulatory scrutiny for at least the next year. The three provincial enforcement tracks — in Quebec, BC, and Alberta — could result in additional corrective orders or, in Quebec's case, potentially significant administrative monetary penalties under the province's updated Law 25 framework, which grants the Commission d'acces a l'information authority to issue fines of up to 4% of global revenues.

The federal OPC, operating under PIPEDA, lacks the same penalty powers — a limitation that privacy advocates have long argued makes Canadian federal privacy law a weak enforcement instrument. But the conditional resolution the OPC negotiated, with its built-in reporting requirements, at least ensures ongoing accountability at the national level.

For the broader AI industry, the signal is unmistakable: the era of training large language models on whatever data was technically accessible, and asking questions about consent later, is ending. Canada has drawn a line. Other jurisdictions are drawing pencils.

"The manner in which OpenAI initially collected personal information from publicly accessible websites was overbroad and therefore inappropriate."

— Office of the Privacy Commissioner of Canada, PIPEDA Findings

Regulators involved

Provinces continuing enforcement

3-6 months

Remediation timeline

Four Regulators, One Damning Conclusion

A Split Decision With Consequences

Why This Ruling Is Different

The Global Ripple Effect

What Comes Next

Sources