Analyzing Bias in Large Language Model Solutions for Assisted Writing Feedback Tools: Lessons from the Feedback Prize Competition Series
Perpetual Baffour, Tor Saxberg, Scott Crossley
18th Workshop on Innovative Use of NLP for Building Educational Applications Paper
TLDR:
This paper analyzes winning solutions from the Feedback Prize competition series hosted from 2021-2022. The competition sought to improve Assisted Writing Feedback Tools (AWFTs) by crowdsourcing Large Language Model (LLM) solutions for evaluating student writing. The winning models are freely availa
You can open the
#paper-BEA_34
channel in a separate window.
Abstract:
This paper analyzes winning solutions from the Feedback Prize competition series hosted from 2021-2022. The competition sought to improve Assisted Writing Feedback Tools (AWFTs) by crowdsourcing Large Language Model (LLM) solutions for evaluating student writing. The winning models are freely available for incorporation into educational applications, but the models need to be assessed for performance and other factors. This study reports the performance accuracy of Feedback Prize-winning models based on demographic factors such as student race/ethnicity, economic disadvantage, and English Language Learner status. Two competitions are analyzed. The first, which focused on identifying discourse elements, demonstrated minimal bias based on students' demographic factors. However, the second competition, which aimed to predict discourse effectiveness, exhibited moderate bias.