The EGRA-AI Consortium

Learning by Doing: Research & Development Project


The consortium is exploring whether AI models can be used to automatically assess children’s early reading abilities (letter-sounds and word-reading) in African languages.


The aim of the exploration is to use AI to make it possible to lower the cost of implementing reliable reading assessments through self supervised assessment, moving from time consuming and expensive one-on-one assessments to one-to-many assessments. In particular, we are interested in its applicability to low researched languages on the African continent.

We are utilsing the grant funding to expand the project to cover the isiXhosa language, specifically for the following:

  • App development to make improvements based on our learnings in the first round of deployment, including integrating noise detection and converting from a swiping to a tap-to-speak interface.
  • Using local language experts to design the isiXhosa assessment, label the data and quality assure marking.
  • Data collection using both the one-on-one EGRA and the isiXhosa EGRA-AI.
  • Platform development to create an interface that local mothertongue human markers can use to label the recordings captured in the field.
  • Custom AI model development.

Why is this important?

70% of children in LMICs cannot read, a lack of easy reading assessments makes fixing this problem difficult. In sub-Saharan Africa that rate is 87%. Between the period 2015-2020, just over 40% of countries in Africa had collected at least one year’s worth of data on foundational mathematics and reading outcomes, however, less than 6% collected this data more than once (UNESCO, 2023). Large-scale representative data on foundational reading skills are essential for identifying gaps and monitoring progress, as are evaluations to identify successful interventions to improve reading. However, these are expensive as they currently rely on one-on-one assessments. In the classroom, formative assessment is essential for targeting instruction at the level of the learner but extremely challenging to implement in large-class contexts with existing reading assessment tools. By using AI we can standardise and reduce the cost of conducting EGRA at scale, reduce bias in the data and provide deeper insight into reading programs in mothertongue.



We intend to produce a report based on two pilots we are currently implementing which will include details on accuracy, lessons learnt and recommendations.

We also intend to produce an app and the underlying AI infrastructure required to add new languages and conduct EGRA-AI in mother-tongue, in any country, in any language.

Project timeline

7 months

Current progress

To date we have collected recordings in two South African languages – Sepedi and isiXhosa from around 3000 grade 2 and 3 children. Using the Sepedi data, we have used Meta’s wav2vec to transcribe the audio into IPA, and produced multiple automated marking AI models. We have had humans mark the responses multiple times to provide both training data for fine-tuning as well as the ability to verify the results and measure accuracy.

The isiXhosa responses are in the process of being marked.

Meet the team

Cally Ardington is the Professor in the Southern Africa Labour and Development Research Unit (SALDRU) at the University of Cape Town. She is the principal investigator of the project.

What are you hoping to learn from the AI exploration?

We are hoping to learn:

  1. What is required to deploy an application like this into the field, along with training and support materials.
  2. What current state-of-the-art AI models are capable of when it comes to recognising speech in mother-tongue at a phonemic level.
  3. Where they fall short when it comes to supporting children’s mother-tongue speech recognition.
  4. Efficient methods for capturing and processing data, as well as having humans in the loop to verify the data.
  5. How to fine-tune models and develop new models fit-for-purpose for these types of assessments.

What is the biggest challenge you are currently facing?

Producing a model that has a high accuracy and low number of false positives and false negatives.

What are you hoping to get from the community?

We are hoping to learn from some of the experiences and expertise in the community to improve and further refine our project. In future, we would also like to extend our work into the broader assessment umbrella e.g. comprehension and partner with projects working in other low researched languages who are further ahead on different aspects.

How are you ensuring your AI tool is pedagogically sound?

Our Principal Investigator has conducted evaluations of foundational learning programs in South Africa and Ghana and has worked closely with the South African National Department of Basic Education (DBE) in establishing grade-specific reading benchmarks in African languages. The team also comprises an experienced neuropsychologist and local language experts.

What measures are you taking to lower barriers to accessing AI in LMICs?

We are collecting real data from underserved children and investing in native speakers to label the data thereby providing unbiased labeling.

We are making it possible to produce AI that can understand the nuances of localised language and not exclude children because of their dialect or accent.

Expanding the EGRA-AI assessment project into isiXhosa language.

Pilot information
Audio: ASR
Large Language Models
ML: Classification
ML: Clustering
ML: Dimensionality Reduction
NLP: Generation

Based in South Africa and Australia


Developing products for South Africa

Contact – University of Cape Town

Cally Ardington



Learning By Doing

We are providing small grants to support the development of AI products & components in LMICs. We know that innovation investment is high-risk. Our aim is that our community can benefit from the lessons learned in these pilots – what works and what doesn’t.

Learn more about our pilot projects here. We will be following each project and reporting on key learnings.

Sign Up

Join our mailing list to keep up to date with news and events.

Community          Knowledge          FAQ
          Privacy Policy was set up by Fab Inc. in partnership with Team4Tech. We are grateful to the Bill & Melinda Gates Foundation and the Jacobs Foundation for their support.

Powered by FabData.IO