Download PDFOpen PDF in browser

Effectiveness of Privacy-Preserving Algorithms for Large Language Models: a Benchmark Analysis

EasyChair Preprint 14540

8 pagesDate: August 26, 2024

Abstract

Recently, several privacy-preserving algorithms for NLP have emerged. These algorithms can be suitable for LLMs as they can protect both training and query data. However, there is no benchmark exists to guide the evaluation of these algorithms when applied to LLMs. This paper presents a benchmark framework for evaluating the effectiveness of privacy-preserving algorithms applied to training and query data for fine-tuning LLMs under various scenarios. The proposed benchmark is designed to be transferable, enabling researchers to assess other privacy-preserving algorithms and LLMs. The benchmark focuses on assessing the privacy-preserving algorithms on training and query data when fine-tuning LLMs in various scenarios. We evaluated the SANTEXT+ algorithm on the open-source Llama2-7b LLM using a sensitive medical transcription dataset. Results demonstrate the algorithm’s effectiveness while highlighting the importance of considering specific situations when determining algorithm parameters. This work aims to facilitate the development and evaluation of effective privacy-preserving algorithms for LLMs, contributing to the creation of trusted LLMs that mitigate concerns regarding the misuse of sensitive information.

Keyphrases: Benchmarks, Privacy-Preserving Algorithms, differential privacy, large language models

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:14540,
  author    = {Jinglin Sun and Basem Suleiman and Imdad Ullah},
  title     = {Effectiveness of Privacy-Preserving Algorithms for Large Language Models: a Benchmark Analysis},
  howpublished = {EasyChair Preprint 14540},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser