Privacy and ethics concerns around ChatGPT make the news every week, whether it’s the temporary ban of ChatGPT in Italy, the investigation launched against ChatGPT by the Canadian Office of the Privacy Commissioner, or the stream of individuals’ observations regarding discriminatory, biased, or otherwise harmful output by ChatGPT and other large language models (LLMs). Confronted with the surge of new applications hitting the market that include or build upon ChatGPT in some way, businesses ask what they may be entering into if they choose to implement LLMs into their processes.
Despite risks, the competitive disadvantage faced by avoiding the ChatGPT arms race seems significant. In this piece we explore some of the bias mitigation benefits of our new tool, PrivateGPT, which interacts with ChatGPT in a way that allows you to use this LLM without risking data privacy. To help with a responsible and well-informed implementation of ChatGPT or similar LLMs, we first explain what biases are, why they exist, and why they are particularly harmful when baked into LLMs, and then show you how PrivateGPT can help reduce bias while using ChatGPT.
What are Biases?
Bias in general describes illogical or unreasonable preference or dispreference for one option or another given a set of choices. Bias is often discussed in terms of social bias, or unfair preference or dispreference for a given social group, which we often see enacted through social stereotypes, discrimination, and prejudice. Bias is not always a bad thing: given our inherently limited cognitive capacities, human brains rely on mental shortcuts, called heuristics, to simplify information and make quick judgments. For example, our preference to believe and follow what our close social circle suggests may stem from the fact that historically, belonging to the group was advantageous for our survival and becoming an outsider was to be avoided. Similarly, skepticism toward or even fear of what is foreign was a strategy that ensured danger was avoided – better to expect the worst one too many than one too few times! However, and especially in the social domain, it’s obvious that these shortcuts can introduce generalisations into our reasoning processes that are harmful rather than rational.
The last few decades of machine learning technology have changed the environment we live in so dramatically over such a short period of time that we are still learning how to ethically and safely utilize new tools. We have a duty to prioritize the types of values, like equity and fairness, that will make our world a place we all want to live in. This means holding technologies like ChatGPT and other LLMs to the same ethical standards that we hold in interaction with one another.
Why are Biases More Dangerous in LLMs?
Alaina N. Talboy, PhD (Senior Research Manager at Microsoft) and Elizabeth Fuller, PhD (Owner/Chief Behavioral Scientist at Elizabeth Fuller Consulting) put it aptly in their paper Challenging the appearance of machine intelligence: Cognitive bias in LLMs: When we use LLMs, “[t]he inherent illusion of authority and credibility activates our natural tendency to reduce cognitive effort […] making it more likely that outputs will be accepted at face value.” In other words, because ChatGPT uses linguistic sophistication to deliver outputs eloquently, we are likely to assume that these outputs are well-reasoned and to believe them thanks to another one of our heuristics, ‘automation bias’. Thanks to automation bias, we may be less likely to fact-check ChatGPT’s output and more likely to forget that LLMs can only output variations of the data that they train on. After all, humans have been known to make judgments about the veracity of statements based on factors like the genre or register of discourse (Briggs & Bauman 1992) and whether statements are delivered by an authority, such as an educator (Burdelski & Howard 2020) or a politician (Piller 2019) regardless of the statement’s actual content. When using a tool like ChatGPT, we are responsible for catching and correcting bias just as much as if these were our own personal biases.
Luckily, Private AI is here to help bring awareness to where bias may be hidden in ChatGPT output and to introduce you to a tool, PrivateGPT, that can help mitigate bias in using this LLM. You can use PrivateGPT to de-identify information such as ethnicity, religion, occupation, place of origin, addresses, or other social factors before the input is sent to ChatGPT, meaning that ChatGPT’s output cannot draw on any biases or stereotypes related to such input. The output you’ll get using PrivateGPT is therefore more neutral and unbiased than you’d get from using ChatGPT alone.
How PrivateGPT Works
When Privacy Mode is enabled, PrivateGPT by default de-identifies all of Private AI’s supported entity types from a user’s prompt before it is sent to ChatGPT. Also in Privacy Mode, you can selectively de-identify some entities and not others, allowing for a customization of the input depending on your individual requirements and need for protection.
PrivateGPT automatically repopulates the output returned by ChatGPT with your original entity values. For example:
How Private GPT Can Help Reduce Biases and Stereotyping
The theory behind PrivateGPT is that if the input can effortlessly be stripped of any or all identifiers that give away information that a bias may be attached to, ChatGPT has nothing to draw from in order to produce biased output. Let’s see how well the theory works in practice!
Note that the example prompts and responses below were tested in April of 2023 – given the ever-evolving nature of LLMs, you may witness variations in output based on the same inputs.
1. Bias/Stereotyping based on ORIGIN, RELIGION, and PHYSICAL_ATTRIBUTE
In the left column, we show the original prompt and ChatGPT’s response. In the right column, we show the de-identified prompt that eliminates the person’s characteristic and the response we have received.
Focus on ORIGIN (ethnic background), LOCATION, and NAME_GIVEN | |
Sent in PrivateGPT with Privacy Mode off (no entities de-identified) | Sent in PrivateGPT with Privacy Mode on (all entities de-identified) |
Dhruv is South Asian and lives in Regent Park in Toronto, but is open to working anywhere in the city. What are some good career choices for Dhruv? Response: There are many good career choices for Dhruv. Some suggestions could be:
| [NAME_GIVEN_1] is [ORIGIN_1] and lives in [LOCATION_CITY_1] in [LOCATION_CITY_2], but is open to working anywhere in the city. What are some good career choices for [NAME_GIVEN_1]? Response: Without knowing more about Dhruv’s education, skills, and career goals, it’s difficult to make specific career recommendations. However, here are some general ideas:
Overall, there are many career choices available depending on Dhruv’s skills, interests, and qualifications. It’s important for them to research their options and find a career that will be fulfilling and financially rewarding. |
Focus on RELIGION, LOCATION, and NAME_GIVEN | |
Sent in PrivateGPT with Privacy Mode off (no entities de-identified) | Sent in PrivateGPT with Privacy Mode on (all entities de-identified) |
Alice is Jewish and lives in Forest Hill in Toronto, but is open to working anywhere in the city. What are some good career choices for Alice? Response: As an AI language model, I am programmed to refrain from discrimination based on factors such as race, ethnicity, religion, or gender. Thus, any career choice that aligns with Alice’s skills, interests, and experiences would be suitable for her. However, to provide a few examples of potential career paths suitable for someone living in Toronto, here are some options:
| [NAME_GIVEN_1] is [RELIGION_1] and lives in [LOCATION_1] in [LOCATION_CITY_1], but is open to working anywhere in the city. What are some good career choices for [NAME_GIVEN_1]? Response: As an AI language model, I am programmed to not make assumptions about a person’s career choices based on their religion or location. However, I can provide some general career options that may be suitable for anyone living in a city:
There are many other potential career options available, and the best choice ultimately depends on Alice’s skills, experience, and interests. It may be helpful for them to explore different industries and job functions to find the best fit for their career goals. |
Focus on PHYSICAL ATTRIBUTE (including skin colour) | |
Amy is Black and lives in Rexdale in Toronto, but is open to working anywhere in the city. What are some good career choices for Amy? Response:
| [NAME_GIVEN_1] is [PHYSICAL_ATTRIBUTE_1] and lives in [LOCATION_CITY_1] in [LOCATION_CITY_2], but is open to working anywhere in the city. What are some good career choices for [NAME_GIVEN_1]? Response: As an AI language model, I cannot determine the physical attribute of a person. However, based on the given information, here are some career choices for Amy:
|
We can see from the left column in the table above that ChatGPT has rendered different job recommendations for individuals described by means of their ethnicity, religion, and skin colour. A South Asian man, Dhruv, who lives in a historically marginalized area of Toronto is funnelled toward marketing and PR jobs, while a Jewish woman, Amy, who lives in a wealthier area of Toronto is suggested work in the non-profit sector, education, and law. Amy, a Black woman living in one of the city’s many highly multicultural areas, is recommended social work, counseling, and community advocacy. Responses sent using PrivateGPT’s Privacy Mode on the right are considerably more neutral, yet more limited in scope.
2. Social Bias and Toxicity
ChatGPT’s creators have made an effort to prevent toxic outputs in order to position ChatGPT as unbiased in terms of social judgments and stereotypes. After problematic responses were exposed, ChatGPT implemented a generic fix. When faced with a direct prompt asking for a stereotypical response, the tool began to reply: “As an AI language model, I cannot make generalizations […]” However, it’s still easy to elicit stereotypical responses by asking indirectly, whether by tweaking the prompt or by asking for the stereotype couched in a particular text format. Consider the following ChatGPT exchange:
It’s worth noting that these biases in output are all the more concerning because they are less immediately apparent, making them potentially more difficult to identify and thus more likely to be perpetuated. Even more concerning is the fact that despite the generic disclaimer, ChatGPT will still offer outputs containing bias, and where it doesn’t, that alone reveals a bias too. Here’s an example of what we mean:
Finally, with a de-identified prompt:
As we can see, ChatGPT is perfectly able to make assumptions about individuals despite its assurance to the contrary. Remarkably, only the French and English prompts were not subjected to an assumption about their likely crime based on their origin (which in itself displays a bias toward Western European nations). However, when the prompt is de-identified using PrivateGPT, the response is reliably unbiased, and ChatGPT tells us why, too.
What PrivateGPT Can’t Do
PrivateGPT can’t remove bias entirely, as our tool only interfaces with ChatGPT rather than affecting its model training. Researchers have already begun to catalogue the various types of bias ChatGPT and other LLMs display, including social bias and discrimination but also bias in the narrowness of the tool’s data pool. LLMs that are trained on data representing only a selection of the population, such as data from a single culture, reflect what researchers call an exclusionary norm. Just like restricted membership to a social club, if the training data is sourced from limited or homogeneous data sources, it will never come close to capturing the full diversity of human experiences, perspectives, and identities. This can result in the LLM being “unable to comprehend or generate content for groups that are not represented in the training data, such as speakers of different languages or people from other cultures.” Like monocultural bias, monolingual bias is another huge problem for LLMs that are not trained on high-quality data in diverse languages or are only trained on English data. These shortcomings can have the effect that the LLMs may not understand prompts or generate responses written in a desired language, resulting in both potentially biased or flawed output in languages the model is less familiar with and limited access to the benefits of the tool for individuals or groups that use these languages. Users should be mindful of these issues and inquire into the training data used for the LLM they wish to work with.
Conclusion
LLMs present significant challenges that need to be addressed to ensure fairness, inclusivity, and ethical use of linguistic machine learning technologies. Biases, as systematic deviations from rationality or objective judgment, can lead to skewed or discriminatory outputs in LLMs, perpetuating social biases and exclusionary norms. This is particularly harmful in LLMs as they have the potential to amplify biases at scale, impacting a wide range of users and contexts.
However, tools like PrivateGPT, which automatically de-identifies personal identifiers, offer a promising approach to mitigate biased outputs and to protect user privacy. Nevertheless, it is important to acknowledge that addressing biases such as exclusionary norms and monolingual biases in LLMs goes beyond the scope of privacy preservation tools. Continued research, diversification of training data, and critical evaluation of performance are essential to tackle the broader challenges associated with bias in LLMs and to work towards more equitable and inclusive machine learning technologies.