CMOtech Canada - Technology news for CMOs & marketing decision-makers
Sleek ai chatbot glowing sphere over blurred canadian newspapers

Report finds LLMs train on Canadian news, rarely attribute

Thu, 19th Mar 2026

Researchers at McGill University have published an audit of how leading generative AI chatbots use and reproduce Canadian journalism. The audit found low levels of source attribution and frequent answers that could reduce the need to visit original news outlets.

The study tested ChatGPT, Gemini, Claude and Grok against thousands of prompts based on real Canadian news stories in English and French. It also examined responses when web search was enabled, and the systems were asked about specific, recent articles from Canadian publishers.

When asked about Canadian news events likely drawn from training data, the four systems provided no source attribution in 82 per cent of responses, according to the report.

The results point to a growing mismatch between Canadian media and copyright rules and the way AI products ingest, synthesise and distribute news reporting, the researchers said.

Two-part audit

The first part tested the models on 2,267 Canadian news stories in both official languages, running 18,134 queries. It measured what the systems appeared to have absorbed from training data and how often they credited sources.

The second part examined 140 recent articles from seven Canadian outlets. With web search enabled, the researchers asked the same models about those stories across 3,360 experimental conditions to assess whether the answers could stand in for the original reporting and whether they credited the source.

In the web-enabled tests, the models included links to Canadian websites in 29 to 69 per cent of responses. They named the originating outlet in the response text in only 1 to 16 per cent of cases.

When users named an outlet and asked for citations, attribution rose sharply to 74 to 97 per cent. That suggests the systems can provide attribution more consistently, but do not do so by default, the researchers said.

Visibility skew

AI-generated visibility favoured a small group of large national outlets, particularly free-to-access brands.

CBC, CTV and the Globe and Mail captured most of the attention in the systems' outputs, the researchers reported.

Regional and paywalled outlets received less visibility than their share of original reporting would suggest. Across more than 18,000 responses, the Toronto Star was named as a source 11 times and the Montreal Gazette once.

The study also found weaker attribution for French-language journalism. French stories appeared to be absorbed into training data at rates comparable to English ones, but French outlets appeared in citations only 10 per cent of the time. Radio-Canada and La Presse accounted for most of the French citations that did appear.

The Journal de Montreal was "nearly invisible" to the AI systems tested, despite its wide readership in Quebec, the researchers said.

Policy tension

The audit argues that AI companies interact with journalism in three stages: ingestion into training data, production of synthetic answers and distribution through chat interfaces. It says value is transferred at each stage without a corresponding return to publishers.

The researchers contrasted the emerging AI model with the earlier platform era. Social networks and search engines concentrated advertising around news distribution. Generative AI systems, by contrast, absorb reporting and present its substance directly in responses, reducing the need for audiences to click through to the originating outlet.

The Online News Act established a bargaining process for large technology platforms that profit from news content, but it focuses on services that index and display news. Generative AI does not fit that definition, the researchers said, because it synthesises material into answers rather than presenting original content.

The audit also points to copyright uncertainty. It notes that the Copyright Act's fair dealing doctrine applies to enumerated purposes, and whether large-scale commercial AI training qualifies as "research" has not been tested in Canadian courts.

The paper also cites a Canadian news media lawsuit against OpenAI in Ontario as a possible route to judicial decisions on specific disputes, while noting that courts would not design a broader framework for the sector.

In November, Canadian News Media Companies, including The Toronto Star, The Globe and Mail, PostMedia, and the CBC, launched a lawsuit against OpenAI, alleging it is using their content to train its models without compensation.

Attribution standard

The audit identifies attribution as a nearer-term intervention than compensation. It argues that the higher attribution rates seen when users explicitly request citations show that the technical means already exist to identify sources in responses.

Any standard would also need to address the disparity for French-language outlets, where citation rates lagged despite similar levels of apparent ingestion into training data, the paper says.

"The rules governing how these companies use journalism (who gets credited, who gets compensated, and what obligations attach to those who profit), are being set right now, by default, through inaction," the report stated. "Canada has tools and precedent to act responsibly."