
Google has joined efforts to localise artificial intelligence for Africa by collaborating with leading universities and research institutions across the continent to launch WAXAL, a large open-source speech database designed to accelerate the development of voice-based AI for African languages.
The initiative brings together African institutions including Makerere University in Uganda, the University of Ghana, Digital Umuganda in Rwanda, and the African Institute for Mathematical Sciences (AIMS). Together, they contributed speech data covering 21 Sub-Saharan African languages, among them Hausa, Luganda, Yoruba and Acholi.
WAXAL provides foundational data for building speech recognition systems, voice assistants, text-to-speech tools and other voice-enabled applications, with potential use across education, healthcare, agriculture and public services.
“This dataset provides a critical foundation for students, researchers and entrepreneurs to build technology on their own terms, and in their own languages,” said Aisha Walcott-Bryantt, Head of Google Research Africa.
Closing the language gap in AI
The launch of WAXAL comes amid growing momentum across Africa to develop language technologies that reflect local cultures and realities. In September 2025, the Nigerian government introduced N-ATLAS, an open-source language model capable of recognising and generating speech in Yoruba, Hausa, Igbo and Nigerian-accented English.
Private sector innovation is also gaining ground. South African startup Lelapa AI, for instance, has developed Vulavula, a tool offering speech recognition, translation and sentiment analysis for African languages.
By making its speech dataset openly accessible, WAXAL aims to fuel a new wave of homegrown AI solutions and reduce the continent’s dependence on imported language technologies.
Although Sub-Saharan Africa is home to more than 2,000 languages, studies suggest that fewer than five per cent have the data resources required for Natural Language Processing (NLP)—the technology that enables computers to understand and respond to human language. This shortage has long limited the accuracy of speech recognition and text-to-speech systems for African users.
Built in Africa, for Africa
Developed over three years with funding and technical support from Google, WAXAL seeks to address this gap. The dataset contains over 11,000 hours of speech drawn from nearly two million individual recordings, covering languages such as Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Swahili and Yoruba.
Under the project’s partnership model, contributing institutions retain ownership of the data they collect while making it freely available to researchers and developers worldwide.
“For AI to have real impact in Africa, it must speak our languages and understand our contexts,” said Joyce Nakatumba-Nabende, Senior Lecturer at Makerere University’s School of Computing and Information Technology.
“The WAXAL dataset gives our researchers access to high-quality data needed to build speech technologies that truly reflect our communities.”