Download PDFOpen PDF in browser

Prompts De-Biasing Augmentation to Mitigate Gender Stereotypes in Large Language Models

EasyChair Preprint 15854

14 pagesDate: February 21, 2025

Abstract

Large Language Models (LLMs)  have manifested impressive ability in the natural language processing (NLP) area, especially in the power of generating and understanding human languages.
However, the training of LLMs is a double-edged sword; on the one hand, LLMs gain the ability to understand and generate text training from the human context, but on the other, they also inevitably inherit the negative, stereotyped, and biased semantics in the context. Therefore, how to mitigate bias and stereotypes in generative LLMs is important to build a healthy, ethical, and fair environment for use in real-world scenarios. Previous studies have proposed strategies for fine-tuning models to mitigate gender stereotypes. Unfortunately, labelling and generating high-quality, de-biased data for fine-tuning is a costly process. Although Counterfactual Data Augmentation (CDA) and sentence templates provide low-cost possibilities, they may also introduce new biases. In this work, we introduce a new method to augment neutral sentences for fine-tuning LLMs to mitigate gender stereotypes, named Prompts De-Biasing Augmentation (PDA). Compared with the reversal attributed words in sentences augmented by CDA, the data augmented by PDA proposed in this work to fine-tune LLMs can more effectively reduce gender stereotypes while maintaining the generative ability of the pre-trained model. In addition, this work also proposed three metrics to quantify gender inclusiveness in an unlabelled gender stereotype benchmark. The experimental results show that the neutral sentences augmented by PDA have a better de-biasing performance for the parameter-efficient fine-tuning (PEFT) in six evaluation metrics under three test benchmarks rather than the gender reversal sentence augmented by CDA.

Keyphrases: De-Biasing Augmentation, Fairness of LLMs, Gender Bias & Stereotype, LoRA & QLoRA fine-tuning, large language models

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15854,
  author    = {Jinyuan Chen and Sebastian Binnewies and Bela Stantic},
  title     = {Prompts De-Biasing Augmentation to Mitigate Gender Stereotypes in Large Language Models},
  howpublished = {EasyChair Preprint 15854},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser