Summary of GAIED Papers

We summarised papers from the Generative AI for Education (GAIED) workshop from NeurIPS’23.

News • January 2024

In December, Fab, in partnership with Jun Ho Choi and Daniel Björkegren presented our TheTeacher.AI work at the NeurIPS’23 Workshop: Generative AI for Education (GAIED). The workshop aimed “to bring together researchers, educators, and practitioners to explore the potential of Generative AI for enhancing Education.”

As part of this, 33 papers on AI in Education were presented – and we read them all. This is part of our wider work tracking and cataloguing the evidence on Generative AI use in education – we are currently building our scouting network – let us know if you want to be involved via

We wanted to share what we took from the GAIED papers, including some of the trends we saw.

First up – was (Almost) everyone is engaging with ChatGPT.

Out of 33 papers, 27 include building or applying new technologies. Of these, 20 were engaging with OpenAI models (74%) either ChatGPT 3.5 Turbo or ChatGPT 4 (most via prompt engineering, (13 papers), sometimes fine-tuning (4 papers) or using Retrieval-Augmented Generation (RAG – 2 papers)), with one paper comparing GPT-4 to other NLP methods.

Our take-away here was that the bulk of the new advances involve plenty of text generation, and ChatGPT is the most popular platform of choice. It also shows that there’s plenty of mileage in humble prompt generation – though the improvements from the more complex methods were interesting.

Overall, we think that about half the papers could have direct applications in FLN*

— with the majority that weren’t applicable looking at Computer Science education – which isn’t surprising given the networks the conference is aimed at.

For a quarter of the papers, we saw potentially they had applications to Student Learning through 1-2-1 support and personalisation of that support. However, this mostly involves training chatbots to be able to answer specific queries by learners, rather than adaptive learning algorithms.

One paper focused on using AI to teach, rather than to give away answers. For example, Code Soliloquies for Accurate Calculations in Large Language Models fine-tuned a model (Large Language Model Meta Artificial Intelligence- (LLaMA) to give hints to students to help them answer questions by themselves, rather than answering the questions directly.

Another prominent application of AI is Assessment Tools & Processes in terms of content creation and adaptation. Most of these are aimed at teachers and use text-based LLMs.

Some papers focus on generation of questions and answers based on a learning material (Angel: A New Generation Tool for Learning Material based Questions and Answers and Evaluating ChatGPT-generated Textbook Questions using IRT). Others focus specifically on multiple choice questions, either on generating the questions (Towards AI-Assisted Multiple Choice Question Generation and Quality Evaluation at Scale: Aligning with Bloom’s Taxonomy and Small Generative Language Models for Educational Question Generation) or distractors (Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning).

*FLN: Foundational Literacy and Numeracy

One thing that was mostly missing was multi-modal AI.

The ability to process and/or generate image and voice can help with many aspects of teaching and learning, from offering personalisation to creation of high-quality materials. This Multi-modal AI came up a lot in our recent discussions on potential AI-ideas. However, of the 33 papers, only four utilise multimodal generative AI. The two with direct applications to foundational literacy and numeracy (FLN) were in generating language learning games, utilising text-to-image technology (WordPlay: An Agent Framework for Language Learning Games) and classifying types of educational videos using audio and visual cues (Detecting Educational Content in Online Videos by Combining Multimodal Cues).

To reach younger children, we need more research on AI’s applications using voice and image beyond the two presented above. We are aware of some – for example applications are automating the early grade assessments using voice recognition (This is part of the Bill and Melinda Gates Foundation Grand Challenge, with the University of Cape Town as principal investigator) and developing language learning games using textless natural language processing (We have given a small grant to Stellenbosch University and Trackosaurus to develop this).

Much more work is needed on applications in a low resource context.

Only two papers direct targeted users in low resource contexts (Transforming Healthcare Education: Harnessing Large Language Models for Frontline Health Worker Capacity Building using Retrieval-Augmented Generation and Are LLMs Useful in the Poorest Schools? TheTeacher.AI in Sierra Leone).

One is our work with the TheTeacher.AI chatbot in Sierra Leone. The other is health-related, focusing on educating front line health workers in India on pregnancy knowledge.

As we know that low resource contexts face more barriers in terms of access to internet/devices as well as gaps in language and user knowledge, there is work to be done on adapting the cutting-edge technology and bringing them to LMICs. Part of it might be developing and testing out small language models that can be implemented on smaller devices with less data requirements (Small Generative Language Models for Educational Question Generation). We are exploring this with Beekee through a small grant.

One big lesson is that Generative AI can be helpful with prompt engineering alone.

Many papers adapted ChatGPT with prompt engineering alone (such as Generative Agent for Teacher Training: Designing Educational Problem-Solving Simulations with Large Language Model-based Agents for Pre-Service Teachers and Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning). This often resulted in better quality outputs compared to outputs based on basic prompts. The former got positive feedback from teachers and the latter found that outputs using careful prompting resulted in greater similarity with items created by human experts. This increases accessibility to the technology as people with no programming background can use AI effectively.

However, there is a need for evidence on how generative AI results can be judged.

The papers mainly judge the performance of AI tools using output accuracy, and sometimes using human experts to judge output quality. This works fine with outputs related to the creation of materials with clearly defined ‘correct answers’ (such as video categorisation or the creation of mathematical graphs).

In applications related to teaching, however, factual correctness is not sufficient to determine the AI tools’ quality and we should think through more on what matters. Ruffle&Riley: Towards the Automated Induction of Conversational Tutoring Systems, for example, piloted conversational tutoring systems with students and evaluated both students’ performance (no change) and learning experience (better than no tutor). In the coming months, in partnership with the Jacobs Foundation, we will be doing some thinking on what evidence is needed to evaluate AI models in teaching tasks.

Overall, we found plenty of exciting research that helps with Student Learning, with a focus on text-based AI, aimed at older students in high-income countries with access to devices and internet. This reinforces the need to ensure that those in LMICs do not get left behind in this age of rapid AI innovations.

For those who are interested – other uses

Article by Dr Sirin Tangpornpaiboon, Fab Inc.


Dr Sirin Tangpornpaiboon


Education Advisor

Fab Inc. &

Explore the Ideas for How AI Can Help With Education

AI has the potential to transform approaches in education, both inside and outside the classroom. Here, we explore some suggestions of where and how AI technology may be used throughout the education system.

We have established 19 areas of the education system where we envision the use of AI and for each of these areas we have described potential appliations. By outlining these applications we hope to frame dialogue within the community and to establish priorities for investment and development.

Sign Up

Join our mailing list to keep up to date with news and events.

Community          Knowledge          FAQ
          Privacy Policy was set up by Fab Inc. in partnership with Team4Tech. We are grateful to the Bill & Melinda Gates Foundation and the Jacobs Foundation for their support.

Powered by FabData.IO