Training Data Poisoning Lab

Details

Goals & objectives

Hardware & software

Solution overview

Training Data Poisoning is a critical concern and refers to attackers deliberately corrupting the training data that LLMs learn from or utilize in the instances of retrieval augmented generation (RAG). This manipulation aims to introduce vulnerabilities, backdoors and/or biases to the models compromising not only the security but also the effectiveness of a model.

This lab will showcase an online forum that has decided to get ahead of the competition and implement a state-of-the-art AI that will function like a RAG using recipes on the website to provide users with a better experience. However, a disgruntled recipe author doesn't like this idea and decides to conduct some training data poisoning. The lab will walk through poisoning the RAG training data and take a look at potential security techniques to eliminate the risk of training data poisoning.

Lab diagram

Goals and objectives

The goal of this lab is to introduce users to the risks of training data poisoning to Large Language Models (LLM) and Retrieval Augmented Generation (RAG) systems. We will demonstrate the dangers of training data poisoning by taking a look at an online coffee shop forum where users have posted their favorite coffee recipes.

The lab walks the user through accomplishing the following:

Lab Architecture, explanation of concepts, key terms, and technologies.
Taking a look at an example online forum leverage a RAG chatbot.
Poison the RAG chatbot to see potential repercussions of training data poisoning.
Take a look at potential methods for protecting against training data poisoning.

Hardware and software

- Windows 11 Jumpbox
- Ollama LLM Server running Llama3-8B using Nvidia L40S provided by RunAI in AI Proving Ground
- LanceDB for Vector DB