Tackling dirty data: Building an AI-powered metadata classification tool

Briana Smith is a Senior Digital Analyst for NPR based in the United States. Learn how the JournalismAI Skills Lab helped her build a prototype that demonstrates automated content classification and enables taxonomy testing across the organisation.

For a major news organisation like NPR, metadata might seem like a behind-the-scenes concern. But it was different for Briana Smith, Senior Digital Analyst at NPR, working within the audience insights team on digital analytics. To her, the lack of structured, unified metadata was creating real barriers to understanding how audiences interact with content.

“Like most organisations, we have dirty data,“ Smith explains. They had metadata, but it lacked the structure and centralisation needed to serve multiple use cases effectively. Creating insightful reports about audience behaviour proved difficult when working with what Smith describes as "a completely flat horizontal taxonomy" – essentially just a collection of terms rather than a meaningful classification system.

The solution

Smith's initial goal was to build an AI-powered classification system that could automatically apply structured taxonomies to content. However, the JournalismAI Skills Lab, a programme supported by the Google News Initiative, helped her achieve something more valuable: a pivot that addressed both technical and organisational challenges simultaneously.

The breakthrough came when she understood that she didn't need to train an entirely new model. "That concept completely changed what I was able to get done," she reflects. Instead, she could use an existing AI model as an agent within her tool, achieving surprisingly accurate classification results.

The resulting prototype serves dual purposes: it demonstrates automated classification in action, and it allows the team to test different taxonomies in real time. When NPR's taxonomists develop new classification structures, stakeholders across the organisation can now provide input mid-process rather than waiting until implementation.

Smith was the driving force behind building the tool, though executive sponsorship from NPR's new head of AI labs proved crucial for organisational buy-in. The prototype will be deployed as a microservice to a small working group including the taxonomy team, the head of AI labs, and the audience insights team – the people actively working on metadata infrastructure.

Key takeaways

Concepts matter more than code. For Smith, the most valuable lessons from the Skills Lab programme weren't solely technical. Learning about vector databases, semantic versus keyword search, and when to buy versus build transformed her approach. Understanding that she could use an AI model as an agent rather than training one from scratch made the project achievable.

Navigating organisations requires strategy. The one-on-one consultation sessions with the Skills Lab instructors provided invaluable advice on stakeholder mapping and building internal buy-in. Smith's role sits between technical and editorial teams, and the programme helped her communicate value to both sides more effectively.

Know when AI helps – and when it doesn't. With countless AI tools emerging constantly, Smith gained clarity on when AI genuinely supports a project versus when it's unnecessary.This discernment now shapes how she advises her organisation on AI implementation.

Smith's confidence has "grown exponentially." She now feels like a subject matter expert, equipped with a holistic view she can share across the organisation. Her message is clear: metadata represents one of the safer ways to implement AI in newsrooms – an invisible benefit that improves systems.


Next
Next

From intuition to intelligence: Building a data-driven newsroom tool