What is PII?

Kathrin Gardhouse
Apr 17, 2023
Share this post
Sharing to FacebookSharing to LinkedInSharing to XSharing to Email

With data privacy becoming an increasingly hot topic as major data breaches make headlines around the globe, the biggest question typically is: “What PII was exposed?” But what is PII? PII stands for "Personally Identifiable Information." It generally refers to any information that can be used to identify a particular person such as names, credit card number, email, SIN, etc. The definition of PII varies depending on the jurisdiction, the agency, and the context. In the following article we cover the origins of the term PII, whether and how it is used and defined in US federal laws as well as selected state laws, in Europe’s GDPR, and Canada’s federal and provincial private sector privacy laws.

Examples of PII

Examples of Personally Identifiable Information listed by the Department of Homeland Security (DHS) include: name, date of birth, mailing address, telephone number, Social Security number (SSN), email address, zip code, account numbers, certificate/license numbers, vehicle identifiers including license plates, uniform resource locators (URLs), static Internet protocol addresses, biometric identifiers (e.g., fingerprints), photographic facial images, or any other unique identifying number or characteristic, and any information where it is reasonably foreseeable that the information will be linked with other information to identify the individual.” PII can include both direct identifiers and indirect (or quasi-) identifiers. Learn more about direct and indirect identifiers.

Origins

The origins of the term PII are difficult to trace. It may have originated in the US government’s use of the term “sensitive but unclassified.” PII is not consistently used, even within the US. Several states simply use ‘personal information’ or ‘personal data.’

One of the first definitions of PII can be found in the Office of Management and Budget (OMB) Memorandum M-06-19 (Reporting Incidents Involving Personally Identifiable Information and Incorporating the Cost for Security in Agency Information Technology Investments): “For purposes of this policy, the term Personally Identifiable Information means any information about an individual maintained by an agency, including, but not limited to, education, financial transactions, medical history, and criminal or employment history and information which can be used to distinguish or trace an individual's identity, such as their name, social security number, date and place of birth, mother’s maiden name, biometric records, etc., including any other personal information which is linked or linkable to an individual.”

A shortened version of this definition was provided by the OMB in Memorandum M-07-16 (Safeguarding Against and Responding to the Breach of Personally Identifiable Information): The term “personally identifiable information” refers to information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.”

This definition is used, for example, on the website of the US Office of Privacy and Open Government. A very similar definition can be found on the website of the IAPP as well.

DHS in its Privacy Incident Handling Guide from 2012 defined PII in section 1.4.9 as “Any information that permits the identity of an individual to be directly or indirectly inferred, including any other information which is linked or linkable to that individual regardless of whether the individual is a United States citizen, legal permanent resident, or a visitor to the U.S. PII includes any item, collection, or grouping of information about an individual that is maintained by an agency, including, but not limited to, education, financial transactions, medical history, and criminal or employment history.”

We will see in the following sections, perhaps surprisingly, that the term PII is not used in any of the data privacy legislation we compare. However, whatever terminology is used, the general aim of all laws considered is to protect information that is likely to identify an individual. Yet, notable differences remain.

United States Laws – Federal 

US Privacy Act 1974
§552a.(a)(4)

E-Government Act of 2002
§ 208(d)

(Proposed) American Data Privacy and Protection Act

‘record’

information in ‘identifiable form’

‘covered data’

any item, collection, or grouping of information about an individual that is maintained by an agency, including, but not limited to, his education, financial transactions, medical history, and criminal or employment history and that contains his name, or the identifying number, symbol, or other identifying particular assigned to the individual, such as a finger or voice print or a photograph

any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means.

information that identifies or is linked or reasonably linkable, alone or in combination with other information, to an individual or a device that identifies or is linked or reasonably linkable to an individual, and may include derived data and unique persistent identifiers.

   

Excluded:

(i) de-identified data;

(ii) employee data;

(iii) publicly available information; or

(iv) inferences made exclusively from multiple independent sources of publicly available information that do not reveal sensitive covered data with respect to an individual. 

United States – State Specific Definitions

There is currently no comprehensive federal law in place that uniformly applies to data collected by private organizations in the US. There are instead several federal laws that apply to specific kinds of personal information, for example in the healthcare sector.

As the federal government, several states have also started to enact comprehensive privacy laws that would apply regardless of the industry. The following is a selection of enacted or proposed laws showcasing the different approaches taken by the states in this regard.

California

New York (Proposed)

Arkansas

California Consumer Privacy Act of 2018 (CCPA) section 1789.140 as amended by CPRA

New York privacy act § 1100(16)

Arkansas’s Personal Information Protection Act § 4-110-103(7)

‘personal information’

‘personal data’

‘personal information’

information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.

any data that identifies or could reasonably be linked, directly or indirectly, with a specific natural person, household, or device. 

an individual’s first name or first initial and his or her last name in combination with any one (1) or more of the following data elements when either the name or the data element is not encrypted or redacted:

Included (examples from the definition):

  • – Identifiers such as a real name, alias, postal address, unique personal identifier, online identifier, Internet Protocol address, email address, account name, social security number, driver’s license number, passport number.
  • – Commercial information, including records of personal property, products or services purchased, obtained, or considered, or other purchasing or consuming histories or tendencies.
  • – Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer’s interaction with an internet website application, or advertisement.
  • – Biometric information
  • – Geolocation data
 

(A) Social Security number;

(B) Driver’s license number or Arkansas identification card number;

(C) Account number, credit card number, or debit card number in combination with any required security code, access code, or password that would permit access to an individual’s financial account;

(D) Medical information; and

(E)(i) Biometric data.

Excluded:

  • – publicly available information or lawfully obtained, truthful information that is a matter of public concern
  • – consumer information that is deidentified or aggregate consumer information

Excluded:

Deidentified data

Excluded:

Encrypted or redacted data

Excluded:Deidentified dataExcluded:Encrypted or redacted data

Europe

The General Data Protection Regulation (GDPR) of the European Union does not use the term PII, but rather ‘personal data.’ Personal data is defined in Article 4(1) as

"any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”

This definition is substantively similar to many of the US definitions of personal information. Its examples capture well the breadth of information that can now be used to build a profile of and to identify individuals even in the absence of direct identifiers.

Canada

Canadian legislation does not use the US terminology of PII. The federal data protection law, Personal Information Protection and Electronic Documents Act (PIPEDA) defines ‘personal information’ as “information about an identifiable individual.” This definition remains unaltered in the proposed Consumer Privacy Protection Act (CPPA).

There are three provincial privacy laws that apply in the private sector instead of PIPEDA, namely the Alberta, British Columbia, and Québec private sector privacy laws. All three acts use the term ‘personal information.’

Alberta

British Columbia

Québec

Personal Information Protection Act

Personal Information Protection Act

Act respecting the protection of personal information in the private sector

information about an identifiable individual

information about an identifiable individual and includes employee personal information but does not include:

  1. contact information, or
  2. work product information

any information which relates to a natural person and allows that person to be identified either directly or indirectly

One difficulty that arises when interpreting the Canadian definitions of personal information is the little word “about” (or “relates to” in Québec). The Supreme Court of Canada (SCC), for example, has interpreted the term “about” expansively, justifying many categories of information to be considered personal, whether they are sensitive, private, or well-known. Note, however, that the SCC made this interpretation in the context of the federal Privacy Act, which is important because the protection of information held by the government may be regarded in a different light than information held by private organizations. Yet, in the same context the Federal Court of Canada interpreted the definition more narrowly, capturing only information that connotes concepts of intimacy, identity, dignity, and integrity of the individual. The Alberta Privacy Commissioner regards the term as a “highly significant restrictive modifier” and quite different from “relates to.” For example, if an individual makes a complaint or a suggestion and information is gathered or created as a result, this information is not necessarily “about” that individual, although it is in some way connected to the individual.

An example of a surprisingly narrow interpretation of personal information is the Court of Appeal of Alberta’s position that a licence plate number as well as a street address is not personal information because it is not about an individual. The court said:

“The fact that the vehicle is owned by somebody does not make the licence plate number information about that individual. It is “about” the vehicle. The same reasoning would apply to vehicle information (serial or VIN) numbers of vehicles. Likewise a street address identifies a property, not a person, even though someone may well live in the property. The licence plate number may well be connected to a database that contains other personal information, but that is not determinative. The appellant had no access to that database, and did not insist that the customer provide access to it.”

This position seems to be an outlier. We have seen above that most jurisdictions require only indirect identifiers and a reasonable possibility of identification, even if only in combination with other information that is accessible, to meet the definition of personal information.

Canada’s Privacy Commissioner, too, takes a broader view, giving the following examples of personal information in the context of PIPEDA:

  • age, name, ID numbers, income, ethnic origin, or blood type;
  • opinions, evaluations, comments, social status, or disciplinary actions; and
  • employee files, credit records, loan records, medical records, existence of a dispute between a consumer and a merchant, intentions (for example, to acquire goods or services, or change jobs).

Considering the fact that individuals retain their license plate number when selling their cars and are supposed to surrender it when taking residence in a new province, it seems reasonable to consider the license plate number as a number that can be used to identify the individual.

Summary

As we can see, the different approaches taken in the jurisdictions considered in this article lead to significantly different definitions of PII (or equivalent terms). Not all authorities even agree that personal information is, at a minimum, likely able to identify (alone or in combination with other data) an individual person. It is therefore very important to establish the context in which the term PII (or equivalent) is being used and to consult the actual text of the legislation and authoritative guidance.

Most commonly, though, it seems that PII refers to information that can likely be used to identify an individual, if only in connection with other available information. Given that today’s world is empowered with mighty AI, identification that would have been deemed impossible just a little while ago may well be ‘likely’ today, and the fact that ‘other available information’ is growing at unprecedented rates, more and more information will have to be considered PII. Most of the additional information available is made up of indirect identifiers, to be sure, but indirect identifiability suffices for the most common definitions of the term.

How Private AI Can Help With Compliance

Being in the know on what PII exists in your organization and where it lives will allow you to determine what is entailed in compliance with the applicable legislation or industry standard.

Private AI can help you make that determination, even in unstructured data and across 47 languages. Using the latest advancements in Machine Learning, the time to identify and categorize your data can be minimized and compliance facilitated. Private AI can identify over 50 different entities of PII, PCI, and PHI. To see the tech in action, try our web demo, or request an API key to try it yourself on your own data.

Data Left Behind: AI Scribes’ Promises in Healthcare

Data Left Behind: Healthcare’s Untapped Goldmine

The Future of Health Data: How New Tech is Changing the Game

Why is linguistics essential when dealing with healthcare data?

Why Health Data Strategies Fail Before They Start

Private AI to Redefine Enterprise Data Privacy and Compliance with NVIDIA

EDPB’s Pseudonymization Guideline and the Challenge of Unstructured Data

HHS’ proposed HIPAA Amendment to Strengthen Cybersecurity in Healthcare and how Private AI can Support Compliance

Japan's Health Data Anonymization Act: Enabling Large-Scale Health Research

What the International AI Safety Report 2025 has to say about Privacy Risks from General Purpose AI

Private AI 4.0: Your Data’s Potential, Protected and Unlocked

How Private AI Facilitates GDPR Compliance for AI Models: Insights from the EDPB's Latest Opinion

Navigating the New Frontier of Data Privacy: Protecting Confidential Company Information in the Age of AI

Belgium’s Data Protection Authority on the Interplay of the EU AI Act and the GDPR

Enhancing Compliance with US Privacy Regulations for the Insurance Industry Using Private AI

Navigating Compliance with Quebec’s Act Respecting Health and Social Services Information Through Private AI’s De-identification Technology

Unlocking New Levels of Accuracy in Privacy-Preserving AI with Co-Reference Resolution

Strengthened Data Protection Enforcement on the Horizon in Japan

How Private AI Can Help to Comply with Thailand's PDPA

How Private AI Can Help Financial Institutions Comply with OSFI Guidelines

The American Privacy Rights Act – The Next Generation of Privacy Laws

How Private AI Can Help with Compliance under China’s Personal Information Protection Law (PIPL)

PII Redaction for Reviews Data: Ensuring Privacy Compliance when Using Review APIs

Independent Review Certifies Private AI’s PII Identification Model as Secure and Reliable

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

News from NIST: Dioptra, AI Risk Management Framework (AI RMF) Generative AI Profile, and How PII Identification and Redaction can Support Suggested Best Practices

Handling Personal Information by Financial Institutions in Japan – The Strict Requirements of the FSA Guidelines

日本における金融機関の個人情報の取り扱い - 金融庁ガイドラインの要件

Leveraging Private AI to Meet the EDPB’s AI Audit Checklist for GDPR-Compliant AI Systems

Who is Responsible for Protecting PII?

How Private AI can help the Public Sector to Comply with the Strengthening Cyber Security and Building Trust in the Public Sector Act, 2024

A Comparison of the Approaches to Generative AI in Japan and China

Updated OECD AI Principles to keep up with novel and increased risks from general purpose and generative AI

Is Consent Required for Processing Personal Data via LLMs?

The evolving landscape of data privacy legislation in healthcare in Germany

The CIO’s and CISO’s Guide for Proactive Reporting and DLP with Private AI and Elastic

The Evolving Landscape of Health Data Protection Laws in the United States

Comparing Privacy and Safety Concerns Around Llama 2, GPT4, and Gemini

How to Safely Redact PII from Segment Events using Destination Insert Functions and Private AI API

WHO’s AI Ethics and Governance Guidance for Large Multi-Modal Models operating in the Health Sector – Data Protection Considerations

How to Protect Confidential Corporate Information in the ChatGPT Era

Unlocking the Power of Retrieval Augmented Generation with Added Privacy: A Comprehensive Guide

Leveraging ChatGPT and other AI Tools for Legal Services

Leveraging ChatGPT and other AI tools for HR

Leveraging ChatGPT in the Banking Industry

Law 25 and Data Transfers Outside of Quebec

The Colorado and Connecticut Data Privacy Acts

Unlocking Compliance with the Japanese Data Privacy Act (APPI) using Private AI

Tokenization and Its Benefits for Data Protection

Private AI Launches Cloud API to Streamline Data Privacy

Processing of Special Categories of Data in Germany

End-to-end Privacy Management

Privacy Breach Reporting Requirements under Law25

Migrating Your Privacy Workflows from Amazon Comprehend to Private AI

A Comparison of the Approaches to Generative AI in the US and EU

Benefits of AI in Healthcare and Data Sources (Part 1)

Privacy Attacks against Data and AI Models (Part 3)

Risks of Noncompliance and Challenges around Privacy-Preserving Techniques (Part 2)

Enhancing Data Lake Security: A Guide to PII Scanning in S3 buckets

The Costs of a Data Breach in the Healthcare Sector and its Privacy Compliance Implications

Navigating GDPR Compliance in the Life Cycle of LLM-Based Solutions

What’s New in Version 3.8

How to Protect Your Business from Data Leaks: Lessons from Toyota and the Department of Home Affairs

New York's Acceptable Use of AI Policy: A Focus on Privacy Obligations

Safeguarding Personal Data in Sentiment Analysis: A Guide to PII Anonymization

Changes to South Korea’s Personal Information Protection Act to Take Effect on March 15, 2024

Australia’s Plan to Regulate High-Risk AI

How Private AI can help comply with the EU AI Act

Comment la Loi 25 Impacte l'Utilisation de ChatGPT et de l'IA en Général

Endgültiger Entwurf des Gesetzes über Künstliche Intelligenz – Datenschutzpflichten der KI-Modelle mit Allgemeinem Verwendungszweck

How Law25 Impacts the Use of ChatGPT and AI in General

Is Salesforce Law25 Compliant?

Creating De-Identified Embeddings

Exciting Updates in 3.7

EU AI Act Final Draft – Obligations of General-Purpose AI Systems relating to Data Privacy

FTC Privacy Enforcement Actions Against AI Companies

The CCPA, CPRA, and California's Evolving Data Protection Landscape

HIPAA Compliance – Expert Determination Aided by Private AI

Private AI Software As a Service Agreement

EU's Review of Canada's Data Protection Adequacy: Implications for Ongoing Privacy Reform

Acceptable Use Policy

ISO/IEC 42001: A New Standard for Ethical and Responsible AI Management

Reviewing OpenAI's 31st Jan 2024 Privacy and Business Terms Updates

Comparing OpenAI vs. Azure OpenAI Services

Quebec’s Draft Regulation Respecting the Anonymization of Personal Information

Version 3.6 Release: Enhanced Streaming, Auto Model Selection, and More in Our Data Privacy Platform

Brazil's LGPD: Anonymization, Pseudonymization, and Access Requests

LGPD do Brasil: Anonimização, Pseudonimização e Solicitações de Acesso à Informação

Canada’s Principles for Responsible, Trustworthy and Privacy-Protective Generative AI Technologies and How to Comply Using Private AI

Private AI Named One of The Most Innovative RegTech Companies by RegTech100

Data Integrity, Data Security, and the New NIST Cybersecurity Framework

Safeguarding Privacy with Commercial LLMs

Cybersecurity in the Public Sector: Protecting Vital Services

Privacy Impact Assessment (PIA) Requirements under Law25

Elevate Your Experience with Version 3.5

Fine-Tuning LLMs with a Focus on Privacy

GDPR in Germany: Challenges of German Data Privacy (Part 2)

Comply with US Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence using Private AI

How to Comply with EU AI Act using PrivateGPT