Unlocking the Power of Retrieval Augmented Generation with Added Privacy: A Comprehensive Guide

May 23, 2024

Share this post

RAG is a popular approach that improves the accuracy of LLMs by utilizing a knowledge base. In this blog post, we illustrate how to implement RAG without compromising the privacy of your data.

What is RAG

Large language models, such as OpenAI’s gpt-4-turbo and Anthropic’s Claude-3,are very powerful assistants that help us carry out different tasks like summarization, translation or answering our most intriguing questions. However, sometimes they nail the answer to questions, other times they regurgitate random facts from their training data. They may also hallucinate perfectly plausible, yet completely false statements.

Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information.

Using a RAG workflow for an LLM-based application has the following benefits:

Improved accuracy: By retrieving relevant information from a knowledge base or external sources, the LLM can provide more accurate and factual answers to user queries. This is particularly useful when the LLM's internal knowledge is outdated or lacks specific details.
Enhanced context awareness: RAG enables the LLM to incorporate context from retrieved information, making it more effective at answering questions that require background knowledge or understanding of specific domains.
Reduced hallucination: LLMs sometimes generate plausible but incorrect information, a phenomenon known as hallucination. By incorporating retrieved information, RAG can help reduce this issue and provide more reliable answers.
Scalability: RAG allows for the easy integration of new knowledge sources, making it easier to scale the system as new information becomes available.
Interpretability: With RAG, you can trace back the source of the information used to generate an answer, which can help improve the interpretability and trustworthiness of the system.

The basic workflow for a RAG pipeline is as follows:

A user asks a question (we call this the query)
Given the query, we retrieve the relevant documents and paragraphs from our knowledge base.
We feed the query and relevant documents to an LLM and ask it to generate a response.
The response is sent to the user (Optionally, we can also send the relevant documents to the user).

Knowledge base and retrieval

The knowledge base is built by chunking and embedding the source data into vectors. The embedding is done using an embedding model such as OpenAI’s text-embedding-3-small

These embeddings are then stored in a vector store where each embedding vector is linked to its chunk. These vector stores allow us to efficiently find the most similar vectors given a query vector.

To retrieve relevant documents when given a query, we do the following:

Embed the query into a vector.
Get the most similar vectors to our query vector.
Retrieve the chunks corresponding to these vectors.

Where are the privacy risks in RAG?

RAG has proved to be a very effective method to increase the accuracy and robustness of LLM responses. However, it also has its own risks and challenges.

From a data privacy perspective, it has two main risks:

Source data and embeddings
Prompt data and LLMs

1. Source data & embeddings

If you’re building a RAG pipeline to answer questions about your company’s internal docs and guides, you’ll need to chunk and embed these documents. Embedding the documents often means sending the raw text into a third party embedding provider like OpenAI or Cohere. This involves a huge privacy risk as you’re potentially exposing confidential data such as dates, product names, specification, source code or project briefs. In addition, these docs may contain personal information about your clients or employees such as names, dates of birth, social security numbers or salaries. In addition, the user query also needs to be embedded as these embeddings will be used to retrieve the relevant chunks. This is another area where there is a potential data privacy risk.

2. Prompt data sent to LLM

The second main step of a RAG pipeline is utilizing an LLM to generate a response given the query and relevant documents. This means that if we’re using an LLM provider such as OpenAI or Anthropic, we need to send the user query and all retrieved documents as a prompt to the LLM. This is another source of privacy risk in the RAG pipeline

Setting Up (llama-index)

Note: We have an end-to-end colab notebook for this tutorial. You can find it here

For this tutorial, we will use our worldcup data which was used in previous tutorials as well.

source_doc = \
"""Main article: 2022 FIFA World Cup Group A
The first match of the tournament was held between Qatar and Ecuador in Group A. Ecuador had a disallowed goal in the opening minutes, but eventually won 2–0 with two goals from Enner Valencia. Qatar became the first host nation to lose their opening match at a World Cup. Many Qatar natives were seen leaving the game before the end, with ESPN reporting that two-thirds of the attendance had left. The other starting match in group A was won by the Netherlands 2–0 over Senegal. Cody Gakpo scored the opening goal in the 84th minute and Davy Klaassen added a second in stoppage time. Senegal faced Qatar in the third match of the group; Boulaye Dia capitalised on a slip by Boualem Khoukhi to put Senegal 1–0 ahead. Famara Diédhiou scored a second with a header, before Mohammed Muntari scored Qatar's first-ever goal at a World Cup to reduce the deficit back to one. Senegal eventually won the match 3–1 after an 84th-minute goal by Bamba Dieng. With this result, Qatar became the first team to be eliminated from the tournament, as well as becoming the first host nation to ever be knocked out of the tournament after two games. Gakpo scored his second goal of the tournament as the Netherlands led Ecuador; however, Valencia scored an equaliser in the 49th minute. The Netherlands won 2–0 against Qatar following goals by Gakpo and Frenkie de Jong to win the group, while Qatar attained the distinction of being the first home nation to lose all three group matches. Senegal faced Ecuador to determine the second knockout round qualifier. At the end of the first half, Ismaïla Sarr scored a penalty kick to put Senegal ahead. In the 67th minute, Moisés Caicedo scored an equaliser, but shortly after, Kalidou Koulibaly gave Senegal the victory. The win was enough to qualify Senegal as the runners-up of Group A.


Main article: 2022 FIFA World Cup Group B
England completed a 6–2 victory over Iran. Iranian keeper Alireza Beiranvand was removed from the game for a suspected concussion before England scored three first-half goals. Mehdi Taremi scored in the second half after which England defender Harry Maguire was also removed for a concussion. Timothy Weah, of the United States, scored a first-half goal against Wales; however, the match finished as a draw after a penalty kick was won and scored by Gareth Bale. Iran defeated Wales 2–0 following a red card to Welsh goalkeeper Wayne Hennessey after he committed a foul outside of his penalty area. Substitute Rouzbeh Cheshmi scored the first goal eight minutes into stoppage time, followed by Ramin Rezaeian scoring three minutes later. England and the United States played to a 0–0 draw, with only four shots on target between them. England won the group following a 3–0 win over Wales with a goal by Phil Foden and two by Rashford. Christian Pulisic scored the winning goal as the United States defeated Iran 1–0 to qualify for the round of 16.


Main article: 2022 FIFA World Cup Group C
Argentina took an early lead against Saudi Arabia after Lionel Messi scored a penalty kick after ten minutes; however, second-half goals by Saleh Al-Shehri and Salem Al-Dawsari won the match 2–1 for Saudi Arabia, a result described as "the biggest upset in the history of the World Cup." The match between Mexico and Poland ended as a goalless 0–0 draw after Guillermo Ochoa saved Robert Lewandowski's penalty kick attempt. Lewandowski scored his first career World Cup goal in a 2–0 win over Saudi Arabia four days later. Argentina defeated Mexico 2–0, with Messi scoring the opener and later assisting teammate Enzo Fernández who scored his first international goal. Argentina won their last game as they played Poland with goals by Alexis Mac Allister and Julián Álvarez which was enough to win the group; Poland qualified for the knockout stage on goal difference.


Main article: 2022 FIFA World Cup Group D
The match between Denmark and Tunisia ended as a goalless draw; both teams had goals disallowed by offside calls. Danish midfielder Christian Eriksen made his first major international appearance since suffering a cardiac arrest at the UEFA Euro 2020. Defending champions France went a goal behind to Australia, after a Craig Goodwin goal within ten minutes. France, however, scored four goals, by Adrien Rabiot, Kylian Mbappé and two by Olivier Giroud to win 4–1. The goals tied Giroud with Thierry Henry as France's all-time top goalscorer. Mitchell Duke scored the only goal as Australia won against Tunisia. This was their first World Cup win since 2010. Mbappé scored a brace as France defeated Denmark 2–1. This was enough for France to qualify for the knockout round – the first time since Brazil in 2006 that the defending champions progressed through the opening round. Mathew Leckie scored the only goal as Australia defeated Denmark 1–0, qualifying for the knockout round as runners-up with the win. Wahbi Khazri scored for Tunisia against France in the 58th minute. Although Antoine Griezmann equalised in stoppage time it was overturned for offside. Tunisia finished third in the group, as they required a draw in the Denmark and Australia game.


Main article: 2022 FIFA World Cup Group E
Group E began with Japan facing 2014 champions Germany. After an early penalty kick was converted by Germany's İlkay Gündoğan, Japan scored two second-half goals by Ritsu Dōan and Takuma Asano in a 2–1 upset win. In the second group match, Spain defeated Costa Rica 7–0. First-half goals by Dani Olmo, Marco Asensio, and Ferran Torres were followed by goals by Gavi, Carlos Soler, Alvaro Morata, and a second by Torres. This was the largest defeat in a World Cup since Portugal's victory over North Korea in the 2010 event by the same scoreline. Costa Rica defeated Japan 1–0, with Keysher Fuller scoring with Costa Rica's first shot on target of the tournament. Germany and Spain drew 1–1, with Álvaro Morata scoring for Spain and Niclas Füllkrug scoring for Germany. Morata scored the opening goal for Spain against Japan as they controlled the first half of the match. Japan equalised on Ritsu Doan before a second goal by Kaoru Mitoma was heavily investigated by VAR for the ball being out of play. The goal was awarded, and Japan won the group following a 2–1 win. Serge Gnabry scored on ten minutes for Germany against Costa Rica and they led until half-time. Germany required a win, and for Japan to not win their match, or for both teams to win their matches by a combined goal difference of at least 9 goals, to qualify. In the second half, goals by Yeltsin Tejeda and Juan Vargas gave Costa Rica a 2–1 lead, which would have qualified them into the knockout stages ahead of Spain. Germany scored three further goals—two by Kai Havertz and a goal by Niclas Fullkrug, ending in a 4–2 win for Germany—which was not enough to qualify them for the final stages. Japan won the group ahead of Spain.


Main article: 2022 FIFA World Cup Group F
Group F's first match was a goalless draw between Morocco and Croatia. Canada had a penalty kick in the first half of their match against Belgium which was saved by Thibaut Courtois. Belgium won the match by a single goal by Michy Batshuayi. Belgium manager Roberto Martínez confirmed after the game that he believed Canada to have been the better team. Belgium lost 2–0 to Morocco, despite Morocco having a long-range direct free kick goal by Hakim Ziyech overturned for an offside on another player in the lead up to the goal. Two second-half goals from Zakaria Aboukhlal and Romain Saïss helped the Morocco win their first World Cup match since 1998. The match sparked riots in Belgium, with residents fires and fireworks being set off. Alphonso Davies scored Canada's first World Cup goal to give Canada the lead over Croatia. Goals by Marko Livaja, Lovro Majer, and two by Andrej Kramarić for Croatia completed a 4–1 victory. Morocco scored two early goals through Hakim Ziyech and Youssef En-Nesyri in their game against Canada and qualified following a 2–1 victory. Canada's only goal was an own goal by Nayef Aguerd. Croatia and Belgium played a goalless draw which eliminated Belgium, whose team was ranked second in the world, from the tournament.


Main article: 2022 FIFA World Cup Group G
Breel Embolo scored the only goal in Switzerland's 1–0 defeat of Cameroon. Richarlison scored two goals as Brazil won against Serbia, with star player Neymar receiving an ankle injury. Cameroon's Jean-Charles Castelletto scored the opening goal against Serbia, but they were quickly behind as Serbia scored three goals by Strahinja Pavlović, Sergej Milinković-Savić, and Aleksandar Mitrović either side of half time. Cameroon, however, scored goals through Vincent Aboubakar and Eric Maxim Choupo-Moting, completing a 3–3 draw. An 83rd-minute winner by Casemiro for Brazil over Switzerland was enough for them to qualify for the knockout stage. Having already qualified, Brazil were unable to win their final group game, as they were defeated by Cameroon 1–0 following a goal by Vincent Aboubakar. He was later sent off for removing his shirt in celebrating the goal. Cameroon, however, did not qualify, as Switzerland defeated Serbia 3–2.


Main article: 2022 FIFA World Cup Group H
Uruguay and South Korea played to a goalless draw. A goalless first half between Portugal and Ghana preceded a penalty converted by Cristiano Ronaldo to give Portugal the lead. In scoring the goal, Ronaldo became the first man to score in five World Cups. Ghana responded with a goal by André Ayew before goals by João Félix, and Rafael Leão by Portugal put them 3–1 ahead. Osman Bukari scored in the 89th minute to trail by a single goal, while Iñaki Williams had a chance to equalise for Ghana ten minutes into stoppage time, but slipped before shooting. The match finished 3–2 to Portugal. Ghanaian Mohammed Salisu opened the scoring against South Korea, with Mohammed Kudus following it up. In the second half, Cho Gue-sung scored a brace for South Korea, levelling the score. Mohammed Kudus scored again in the 68th minute, winning the match 3–2 for Ghana. Portugal defeated Uruguay 2–0 with two goals from Bruno Fernandes, advancing them to the knockout stage. The game's first goal appeared to have been headed in by Ronaldo, but the ball just missed his head. A controversial penalty decision was called late in the game, with a suspected handball from José María Giménez. Portugal led South Korea through Ricardo Horta after 10 minutes. However, goals by Kim Young-gwon and Hwang Hee-chan won the match 2–1 for South Korea. Giorgian de Arrascaeta scored two goals as Uruguay defeated Ghana 2–0. However, with South Korea winning, Uruguay required another goal to progress as they finished third on goals scored. Several Uruguay players left the pitch after the game surrounding the referees and followed them off the pitch."""

Let’s set up a basic rag workflow using llama-index, a popular framework for building production-grade RAG pipelines.

Our pipeline will have the following steps:

The documents are chunked, embedded and stored in our vectorstore
The vectorstore is used as an index; when a new query comes in, we embed it and search the index for the relevant documents.
The query and the relevant docs are fed into the LLM which will generate the response.

We will use llama-index to orchestrate all these steps so the first step is to install it

pip install llama-index

Vector Store

We will use ChromaDB as our vector store. ChromaDB is an open source vector store framework that allows us to efficiently store and query multidimensional embeddings. We can leverage the llama-index connector for ChromaDB to set it up easily. Let’s install the llama-index connector for ChromaDB (this also installs ChromaDB itself)

pip install llama-index-vector-stores-chroma

Let’s set up a ChromaDB instance and create a collection for our data. We will then create our vector store index

import chromadb
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore


# Create chroma db client
chroma_client = chromadb.EphemeralClient()
# Create collection for our worldcup data
chroma_collection = chroma_client.create_collection("worldcup")


vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

Document Loading and Chunking

Now we need to load and chunk our document to generate nodes that will be embedded and stored in the vector store index

from llama_index.core.schema import TextNode
# Use double newlines as our delimiter and split the text based on this
delimiter = "\n\n"
nodes = [TextNode(text=chunk) for chunk in source_doc.split(delimiter)]

Embedding Source Documents

Let’s embed these nodes and store them in our index. We will be using OpenAI’s embedding model so make sure to create a valid API key and store it in the variable OPENAI_API_KEY

from  llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY)
index = VectorStoreIndex(nodes, storage_context=storage_context, embed_model=embed_model)
retriever = index.as_retriever()

LLM Assistant

Let’s create an LLM Assistant that will answer the user’s questions. We will be using an OpenAI model, specifically “gpt-3.5-turbo”. We will choose a low temperature to ensure the model’s responses are factual and it is less prone to hallucination

from llama_index.llms.openai import OpenAI


llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2, api_key=OPENAI_API_KEY)

RAG Pipeline

Let’s now put all these pieces together and create our RAG pipeline. We will implement it by creating a function that takes the user’s query as input and returns the response as its output

def get_response(query):
# A concise RAG prompt template instructing the LLM to use the provided context to answer the question
prompt = """Generate a suitable response to the following QUERY given CONTEXT
QUERY: {query}
CONTEXT: {context}"""
# Retrieve the relevant document chunks based on the user's query
results = retriever.retrieve(query)
# Concatenate the retrieved chunks and insert them into the prompt as context
context = "\n".join([res.text for res in results])
input = prompt.format(query=query, context=context)
# Send the input to the LLM and get back the response and return it
response = llm.complete(input).text
return response

Let’s now test this RAG pipeline by asking a question and inspecting the answer:‍

response = get_response("Did any groups have a goalless starting match?")
print(“Response:\n”, response)
# Response:
# Yes, the group F's first match between Morocco and Croatia ended in a goalless draw.

Great! Our RAG pipeline is working as intended. However, we’re sending all our queries and document chunks to a third party provider. This poses a serious privacy risk. One way to handle this is to redact sensitive information. Let’s see how we can do this.

Prompt only privacy

Our first approach will be concerned with the prompt sent to the LLM. If you’re using a third party API provider such as OpenAI or Anthropic, any data included in the prompt will be shared with these providers and this poses a huge risk. One way to mitigate this is to redact the prompt; this masks out PII data before sending it to the API provider.

Redacting the prompt

PrivateAI is built with ease-of-use in mind. We can conveniently redact the input to the LLM by simply invoking the PrivateAI API to pseudonymize the retrieved context and sending the pseudonymized prompt to the LLM instead.

Let’s first set up our PrivateAI client. If you’re using the hosted public PrivateAI endpoint, make sure to store your API key in the PRIVATEAI_API_KEY variable.

from privateai_client import PAIClient
from privateai_client import request_objects
pai_client = PAIClient(url=PRIVATEAI_URL, api_key=PRIVATEAI_API_KEY)

We have the ability to customize how the Private AI engine handles entities. For example, we can instruct it to block or allow certain entities based on a regex expression.

Now let’s create a function that redacts a piece of text. In this function, we will use a regex pattern to detect and block percentage values. This can be helpful if, for example, we’re dealing with financial data and the growth numbers are confidential.

def redact_with_regex_filter(raw_text):
# A regex pattern that detects percentage values (e.g. Revenue went up by 50.2%)
pattern = "(?:\\d+([\\.,]\\d+)?\\s?%)|(?:%\\s?\\d+([\\.,]\\d+)?)"
# A filter that blocks entities that match our pattern (percentage values)
filter_obj = request_objects.filter_selector_obj(type="BLOCK", pattern=pattern, entity_type="PERCENTAGE")
entity_detection = request_objects.entity_detection_obj(filter=[filter_obj])
request_obj = request_objects.process_text_obj(text=[raw_text], entity_detection=entity_detection)
response_obj = pai_client.process_text(request_obj)
return response_obj

We will now create a new function that will add a pseudonymization step:


def get_response_redacted_prompt(query):
prompt = """Generate a suitable response to the following QUERY given CONTEXT
QUERY: {query}
CONTEXT: {context}"""
results = retriever.retrieve(query)
context = "\n".join([res.text for res in results])
##########################################
######## REDACT DATA #####################
##########################################
response_obj =  redact_with_regex_filter(context)
deidentified_context = response_obj.processed_text
input = prompt.format(query=query, context=deidentified_context)
response = llm.complete(input).text
return response

Let’s test our new RAG pipeline:

response = get_response_redacted_prompt("Did any groups have a goalless starting match?")
print(“Response:\n”, response)
# Response:
# Yes, Group F's first match was a goalless draw between [ORGANIZATION_4] and [ORGANIZATION_5].

Great! Our new pipeline is working as expected and is redacting sensitive information, such as organizations, from our prompts.

However, this output isn’t very helpful to the user since they don’t know which teams were involved in a goalless draw. Let’s see how we can handle this.

Re-identifying the prompt

We can go one step further and re-identify the redacted entities in the LLM’s response. This makes the responses much more informative for the user.

To do this, we will add another step in our pipeline that takes our redacted response, sends it to PrivateAI and retrieves the de-identified text:‍

def get_response_redacted_prompt_reidentify(query):
prompt = """Generate a suitable response to the following QUERY given CONTEXT
QUERY: {query}
CONTEXT: {context}"""
results = retriever.retrieve(query)
context = "\n".join([res.text for res in results])
##########################################
######## REDACT DATA #####################
##########################################
response_obj =  redact_with_regex_filter(context)
deidentified_context = response_obj.processed_text
# Get the list of entities from our redacted text
entity_list = response_obj.get_reidentify_entities()
input = prompt.format(query=query, context=deidentified_context)
de_identified_response = llm.complete(input).text
##########################################
######## RE-IDENTIFY DATA ################
##########################################
request_obj = request_objects.reidentify_text_obj(
    processed_text=[de_identified_response], entities=entity_list
)
response_obj = pai_client.reidentify_text(request_obj)
re_identified_response = response_obj.body[0]
return re_identified_response

For more details about re-identifying redacted text, please take a look at our detailed example

Redacting the source data

Our second approach will be concerned with the source data that is chunked and sent to the embedding model. If you’re using a third party API provider such as OpenAI or Cohere, any data included in the chunks will be shared with these providers and this poses a huge risk. One way to mitigate this is to redact the source data; this masks out PII data before sending it to the API provider.

We can use the same function to redact the chunks before creating the nodes:

import chromadb
from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore


# Create chroma db client
chroma_client = chromadb.EphemeralClient()
# Create collection for our worldcup data
chroma_redacted_collection = chroma_client.create_collection("redacted_worldcup")


vector_store = ChromaVectorStore(chroma_collection=chroma_redacted_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

from llama_index.core.schema import TextNode
# Use double newlines as our delimiter and split the text based on this
delimiter = "\n\n"
nodes = []
for chunk in source_doc.split(delimiter):
   response_obj =  redact_with_regex_filter(chunk)
   deidentified_chunk = response_obj.processed_text[0]
   entity_list = response_obj.get_reidentify_entities()
   # Redact the chunk before embedding it
   nodes.append(TextNode(text=chunk, embedding=embed_model.get_text_embedding(deidentified_chunk)))

from  llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY)
index = VectorStoreIndex(nodes, storage_context=storage_context, embed_model=embed_model)
retriever = index.as_retriever()

We have now created a new vector store index where the embeddings are based on the redacted document chunks.

Prompt flow with redacted data

Since our embeddings are based on the redacted source document chunk, we also need to redact the query before embedding it. Otherwise, our vector search might fail to retrieve the relevant documents since there will be a mismatch between entities in the query and entities in the source. In addition, as we mentioned earlier, user queries themselves can contain sensitive information which require redacting.

We can easily achieve this as follows:

def get_response_redacted_embedding(query):
# A concise RAG prompt template instructing the LLM to use the provided context to answer the question
prompt = """Generate a suitable response to the following QUERY given CONTEXT
QUERY: {query}
CONTEXT: {context}"""
# Retrieve the relevant document chunks based on the user's REDACTED query
response_obj =  redact_with_regex_filter(query)
deidentified_query = response_obj.processed_text[0]
results = retriever.retrieve(deidentified_query)
# Concatenate the retrieved chunks and insert them into the prompt as context
context = "\n".join([res.text for res in results])
input = prompt.format(query=query, context=context)
# Send the input to the LLM and get back the response and return it
response = llm.complete(input).text
return response

Comparing the two methods

We’ve now covered two approaches in detail. Let’s compare the two methods:

	Prompt-only Privacy	Source Documents Privacy
Operates on	The input to the LLM	The input to the embedding model
Supported by PrivateAI?	Yes	Yes
Batch / Online	Can only be done online as we don’t know the inputs in advance	Can be done in offline batch mode as we may already have the documents in advance
When to use	When using a third party LLM API provider such as OpenAI and Anthropic	When using a third party embedding API provider such as OpenAI and Cohere

First party data risk

First-party data risk refers to the potential harm or liability that an organization faces when collecting, storing, and using its own customers', users' or employees’ personal data. This type of data is typically collected directly from the individual, often through interactions with the organization's website, mobile app, or other digital platforms when onboarding new users and employees.

There are different forms of first-party data. Examples are:

Personal identifiable information (PII) such as names, addresses, phone numbers, and email addresses
Behavioral data, like browsing history, search queries, and purchase history
Device information, including IP addresses, device IDs, and location data
Sensitive information, like health data, financial information, or political affiliations

This type of data can be compromised in different ways such as:

Data breaches: A company's database is hacked, exposing millions of customers' personal data, including credit card numbers and addresses.
Unintended data sharing: An organization inadvertently shares confidential information such as NDAs, SSNs and salaries with all users.
Insufficient data anonymization: A company fails to properly anonymize customer data, allowing individuals to be re-identified and compromising their privacy.

For our RAG pipeline, points 2 and 3 are especially important. An employee from the engineering team shouldn’t have access to the salaries or SSNs of other employees. On the other hand, a member of the HR team might need this information in their day-to-day tasks and responsibilities.

A potential solution for the second point, Unintended data sharing, is to create role-based access levels to the vector store. This ensures that the confidential information will be retrieved for queries by the HR team but not for other teams.

For the third point regarding Insufficient data anonymization, redacting the embeddings properly is an effective way to prevent the accidental re-identification of users’ data.

Wrap up and summary

In this comprehensive guide, we explored the concept of Retrieval Augmented Generation (RAG), a powerful approach to improving the accuracy of Large Language Models (LLMs) by grounding them on external sources of knowledge. We demonstrated how to implement RAG without compromising data privacy, a critical concern when working with sensitive information.

We discussed two main privacy risks in RAG pipelines: (1) source data and embeddings, and (2) prompt data sent to LLMs. To mitigate these risks, we presented two approaches: (1) prompt-only privacy, which redacts sensitive information from the input prompt sent to the LLM, and (2) source documents privacy, which redacts sensitive information from the source documents before embedding them.

We implemented a basic RAG pipeline using llama-index, a popular framework for building production-grade RAG pipelines, and demonstrated how to easily and conveniently add privacy features to the pipeline using PrivateAI, a privacy-enhancing technology. We also compared the two approaches, highlighting their differences in terms of operation, support, and use cases.

By following this guide, developers and organizations can unlock the power of RAG while ensuring the privacy and security of their data.