Mapping 10 years of French broadcast news: INA’s use of AI to make media coverage visible
INA's public platform, data.ina.fr, is using AI to analyse nearly 2 million hours of French radio and television archives. It provides structured indicators for journalists and audiences on topics like gender balance, geographic coverage, vocabulary trends, and media attention.
Hanna Barakat & Cambridge Diversity Fund / Better Images of AI / CC BY 4.0
data.ina.fr is a platform designed to help users understand how French audiovisual media treat news events, personalities and places over time. The tool provides access to ten years of content from evening news bulletins, rolling news channels and radio morning shows, presented through graphs and indicators that reflect editorial dynamics. Camille Pettineo, editorial lead of data.ina.fr, describes it as “the site for objectivising French media”, built to support journalistic inquiry and help the public decode how topics rise or fall in the news cycle.
Inspiration and problem
The project was born from a simple constraint: the sheer volume of audiovisual archives. INA stores tens of millions of hours through legal deposit. When data.ina.fr launched, only 700,000 hours had been processed, but a year later, the total had reached nearly 2 million. Manual exploration was impossible, so AI became the only way to extract patterns from a decade of daily broadcasts.
For Pettineo, a long-time data journalist, the aim was to “make the complexity of the world more readable”. She understood the potential in applying AI to metadata and transcripts to reveal how media pick up societal issues, which personalities dominate airtime or which territories are repeatedly overlooked.
Roadmap to prototyping
The team defined a clear scope of broadcasters, then identified four lenses through which the data would be explored: personalities, words, places and gender balance. They tested combinations of AI tools to generate structured metrics at scale. Once the models produced reliable outputs, the editorial work began. Monthly controls, relevance checks and manual review of samples ensured that the data aligned with journalistic standards. They built the interface to allow comparisons across channels, formats and time periods, and added a monthly barometer to highlight emerging trends.
The platform integrates three main AI systems.
Automated transcription to produce text.
Named entity recognition to extract people and places.
An in-house audio classifier called INA Speech Segmenter, capable of distinguishing speech, music, noise and silence, as well as identifying male and female voices.
These outputs are reviewed, normalised and editorialised before publication.
The team
The project is led by INA’s editorial department, supported by INA’s research department and an editorial team responsible for interpreting, validating and explaining the data. Pettineo leads the editorial exploitation of the dataset and collaborates closely with researchers, developers and documentalists. Journalists across INA use the dataset for investigations, social formats and the monthly barometer. The work requires data journalism skills, editorial judgement, statistical literacy and familiarity with AI limitations.
Challenges ahead
Several AI limitations required careful mitigation. Entity recognition often confused homonyms, such as attributing mentions of the French Navy’s aircraft carrier Charles de Gaulle (R91) to the former president. As Pettineo notes, this type of error reflects “context window” limitations, which the team flags through orange warning icons.
Differences between channels also had to be contextualised. Some increases in topic coverage reflected staff specialisation rather than news events. Ensuring accuracy meant verifying AI output with manual archive checks and expert interviews. Scaling the platform remains costly, with older decades requiring heavy digitisation and computation.
The opportunities
The platform already supports journalistic investigations. The team sees potential to expand to web media, printed press archives or international broadcasters. The public uptake of the tool indicates value beyond newsrooms: a way for audiences to understand editorial choices and AI behaviours.
Three takeaways
AI unlocks insights only when paired with rigorous editorial validation.
Transparent documentation of model errors helps users interpret data responsibly.
Large archive datasets can reveal editorial patterns only when journalists combine quantitative outputs with qualitative fieldwork.
Relive this session, part of JournalismAI Discovery course in French
This case study was produced as part of the 2025 edition of JournalismAI Discovery course in French. Access all sessions recordings here
