El Surti: Building understandable common knowledge
Project: AI KUAA
Newsroom size: 10 - 20
Solution: A Guaraní-speaking chatbot that boosts community engagement and promotes sustainable journalism models.
El Surti, a Paraguayan media collective known for its visual and community-driven journalism, has spent over a decade creating content that reflects the realities of underrepresented populations, content that is particularly relevant to speakers of Guarani and its hybrid with Spanish, Jopará. These are primarily oral languages that remain largely invisible to mainstream digital tools and AI systems, due to a lack of training data and linguistic representation in Large Language Models (LLMs).
To tackle this challenge, El Surti launched AI KUAA, an initiative whose name comes from the Guarani word kuaa, meaning "knowledge." The project aims to bridge the gap between AI language models and non-centralised, oral languages and improve the technological representation and usability of Guarani through a three-pronged approach:
Improving Guarani representation in Mozilla’s Common Voice database
Developing a chatbot capable of understanding Guarani audio inputs
Designing a toolkit for other media outlets to build similar tools and reach underserved audiences
The problem: A digital divide for oral languages
The inspiration for AI KUAA stemmed from El Surti's previous project, "EVA," a chatbot about women in prison for micro-trafficking in Paraguay. During the development of EVA, the team discovered a crucial limitation: AI tools and transcription services struggled to recognise and process Guaraní, particularly the hybrid language known as Jopará, which blends Guaraní and Spanish.
As El Surti’s Director, Alejandro Valdez Sanabria, explains, "AI KUAA seeks to solve a gap between users of non-centralised oral languages and their recognition in the large languages used by artificial intelligence and the most popular chatbots." This difficulty is exacerbated in digital spaces, as there isn't enough documented data to train AI models in these languages. The team realised that to truly serve their community, they needed to make AI understandable and usable in their native tongue.
Building the solution: Roadmap to prototyping
The project's roadmap is a collaborative and iterative process, balancing technological development with community engagement. The goal is not just to build a chatbot but to create a sustainable and evolving solution that empowers the Guaraní-speaking community.
The team
El Surti assembled a diverse team to tackle the project, to "balance both the technical side and the community side." Sebastián Auyanet is the project manager who also oversees the business model and community-oriented aspects of the project. For editorial and community, they hired a community coordinator who is Guaraní-speaking, Leila Bareiro, while Valdez kept the editorial and general leadership.
The technical team is composed of Sebastián Hacher and Axel Marazzi, conversational and UX designers with prior experience on the EVA project, along with a developer.
Tools and process
The team used a mix of established and open-source tools.
A key part of the project involves training AI to understand Guaraní using Mozilla's Common Voice dataset. This platform allows volunteers to donate their voices, creating a repository of spoken language data. The team organised "mingas" (community gatherings) where people could record their voices to train the dataset, to gather several hours of Guaraní voice data, and increase the dataset's validation percentage.
The conversational flows and chatbot logic are being built using platforms like Voiceflow and Botmaker. These tools are robust enough to handle millions of conversations, providing a strong foundation for the project. They work through an API El Surti built, connecting open-source tools like Transformers and PyTorch to process audio and transcribe it.
A key challenge has been adapting these tools for an oral language. The team is working on a voice-first approach, aiming to create a chatbot that can recognise and transcribe voice messages in Guaraní, a feature that has become more readily available in recent months.
Challenges faced
Technical limitations: Training a model to handle Guaraní and Jopará required starting from scarce datasets. Models had to process hybrid sentences with two languages in the same audio. “The difficulty is to detect, translate separately, and make sense of it all in real time without delay,” explains Hacher.
Community engagement: Mozilla’s Common Voice platform requires the ability to read Guarani and Spanish to participate, which excluded many fluent Guaraní speakers who were unable to read the language. The team explored new APIs to enable voice-only contributions through WhatsApp.
Shifting technologies: “Dashboards change every week,” says Auyanet. Constant updates in voice recognition tools forced the team to repeatedly adjust their workflows.
The opportunities: AI KUAA beyond language recognition
Strengthening the presence of Guaraní in the digital sphere reinforces cultural identity and offers new ways to connect with audiences, especially in low-connectivity areas where WhatsApp is a lifeline. The bot could help deliver hyperlocal, personalised information and to enable collaborative storytelling.
By documenting their methodology and sharing their code, El Surti hopes to support other organisations facing similar challenges. With half of the world’s languages underrepresented in AI, the model could have far-reaching applications.
Lessons for newsrooms
Community is the foundation: The project's success is deeply tied to its community-driven approach. By engaging the Guaraní-speaking population in "mingas" to build the dataset, El Surti is not just creating a technical solution but also fostering a sense of belonging and representation. This collaborative model, as Hacher notes, is the "design layer between technology and the community."
Iterate and adapt: The AI landscape is constantly changing, with new tools and models emerging every week. El Surti's team has embraced an iterative approach, constantly testing and fine-tuning their solutions. This flexibility is crucial for developing a product in a rapidly evolving technological environment. The first version of their chatbot, while still in Spanish, was used to sign up subscribers, providing valuable insights into user behavior and community needs.
Prioritise narrative over automation: The team's prior experience with the EVA project taught them the importance of human-centered storytelling. They use AI as a tool for interaction and understanding user intent, but the core narrative remains crafted by human journalists. As Marazzi and Hacher discussed, the challenge is to use these new generative tools to enhance, not replace, human creativity and journalistic integrity.
Explore Previous Grantees Journeys
Find our 2024 Innovation Challenge grantees, their journeys and the outcomes here. This grantmaking programme enabled 35 news organisations around the world to experiment and implement solutions to enhance and improve journalistic systems and processes using AI technologies.
The JournalismAI Innovation Challenge, supported by the Google News Initiative, is organised by the JournalismAI team at Polis – the journalism think-tank at the London School of Economics and Political Science, and it is powered by the Google News Initiative.
