Privacy Enhancing Data De-Identification Framework - ISO/IEC 27559:2022(E)

May 17, 2023
Share this post
Sharing to FacebookSharing to LinkedInSharing to XSharing to Email

If your organization ever finds itself in the position of wishing or having to disclose personally identifiable information (PII), e.g., to third parties for processing purposes, to researchers for scientific purposes, or to the public as a result of access to information obligation, you have to ensure that the privacy of those individuals to whom the data pertains is adequately protected. The new ISO framework provides guidance on how to do so.Proper protection of PII contained in datasets that are supposed to be disclosed, however widely, requires an assessment of the context of disclosure, of the data itself, in particular its identifiability, a mitigation of the latter, and data-identification governance, before as well as after the disclosure. The new ISO framework addresses these steps in detail, and we summarize the highlights in this blog.While ISO standards are voluntary, this framework can be a useful supplement to the requirements of many privacy laws and regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) and the GDPR, which make the de-identification of data mandatory before certain kinds of disclosure.

Context Assessment

Context assessment refers to the determination of a data recipient’s privacy and security posture, the intended use, the level of transparency with which the data will be disclosed, as well as the presumed additional knowledge the data recipient likely has about individuals contained in the dataset. In addition, the context assessment also requires the examination of potential threats.Threat ModelingPotential threats can be categorized into three kinds: deliberate, accidental, and environmental. A deliberate attack refers to an attempt to identify individuals in the dataset by an insider to the recipient’s infrastructure, such as an employee. An accident, on the other hand, describes the unintentional disclosure of PII, for example where the data recipient happens to recognize an individual whose data are included in the dataset. An environmental threat refers to loss or theft of data when all IT security controls fail.The data recipient’s ability to prevent and mitigate the realization of each of these threats should be assessed. For consistency and efficiency, it is recommended that third party audits are undertaken, and relevant certifications considered.Where data are shared non-publicly, a prudent mitigation strategy involves imposing contractual obligations upon the data recipient that require it, for example, to conduct awareness training, limit use cases, prohibit further sharing of the data, and permit recurrent compliance audits.Transparency and Impact AssessmentIn addition to assessing the data recipient, a context assessment should also include engaging other stakeholders, such as the individuals represented in the data, organizations that disclose similar data, or privacy regulators. Disclosing one’s collection, use, and disclosure practices fosters trust and enables stakeholders to voice concerns which can then lead to an appropriate balance between risks and benefits of disclosure.Privacy impact assessments are a useful, and often mandatory, mechanism by which privacy risks are identified and mitigation strategies surfaced. The earlier such an impact assessment is undertaken, the better, as at an early stage there are the most possibilities of implementing privacy by design.

Data Assessment

The purpose of the data assessment is to understand the features of the data that is disclosed and what external background information is likely available to an adversary, that is, a person attempting to re-identify the data contained in a dataset. These insights will inform the decision of what data points need to be de-identified to protect privacy and which ones can remain in support of the use case.Data FeaturesThe ISO frameworks provides a helpful categorization that can be used to structure the data assessment: The dataset is composed of the population unit (usually represented in the rows of a dataset in case of structured data) and the information about that population unit (called attribute).On the level of the population unit, the organization that intends to disclose the data should consider whether the data represents a single unit, i.e., a person or a household, whether it is an aggregate of several units, whether particularly vulnerable individuals are included, and whether the entire dataset is disclosed, or just a sample, leaving uncertainty on the side of an adversary as to who is represented in the data.Considering the attributes, it needs to be determined whether they constitute direct or indirect identifiers, their level of uniqueness and sensitivity, whether they can be complemented with other available data, and what the adversary can learn about them through targeted searches. This assessment will show the value of the data to an adversary.Attack ModellingThe data assessment, in a second step, requires a quantification of the risk. For this purpose, the framework considers only deliberate attacks and further divides them into three scenarios: (1) Prosecutor risk - the adversary knows that the individual they are targeting is included in the dataset, (2) Journalist risk – the adversary does not know whether the targeted individual’s data are included in the dataset, e.g., because only a sample of the complete dataset in which the individual is included is made available, and (3) Marketer’s risk – the attack is not targeted but rather the objective is to identify as many individuals as possible.Metrics associated with these three attacks are the maximum risk or the average risk. The former is the metric of choice when no security measures are applied, such as when the data are made publicly available. The maximum risk is calculated by considering the risk to the individual that is most likely identified in an attack. This level of prudence is necessary because it must be expected that an attacker will attempt identification even if just for purposes of demonstrating the possibility of doing so.The average risk metric can be applied if there are additional controls in place to protect the data, so that a demonstration attack is prevented. As the name suggests, in this case the average identifiability across the entire dataset is calculated.

Identifiability Assessment and Mitigation

In the identifiability assessment, the results of the context and data assessment are brought together to quantify the identifiability. The probability of identification of an individual is determined by the probability that identification will occur, assuming there is going to be a threat, e.g., a deliberate attack, times the probability of the occurrence of a threat.P(identification) = P(identification | threat) × P(threat)Assessing IdentifiabilityThe result of this function will be a number which can then be compared to well-established identifiability thresholds that are set out in Annex B to the ISO framework. Depending on whether an attack against an entity or a group is being modeled, and depending on how likely and impactful the attack would be, the threshold lies between 0.0005 and 0.1.Subjective as well as objective factors may be considered when evaluating the impact a successful threat will have on an individual whose data are disclosed. The higher the impact, the lower the threshold should be. However, legitimate benefits of the disclosure can also be taken into account, as well as the reasonable expectation of privacy of the individuals the information of whom is contained in the dataset.Assumptions made about the identifiability of the data should be subjected to adversarial testing, which is a simulation using a friendly adversary with relative competence and common external resources. While good practice, the ISO concedes that this approach is resource intensive and cannot represent all the possible threats the data may be exposed to.MitigationMitigating efforts can be directed at the context of disclosure or the data itself, or both. Where applicable, the data recipient may be contractually required to enhance its security practices, to limit access on a strict need-to-know basis, permit the analytical output to be checked by the disclosing organization, etc.Modifying the data itself will often impact the utility of the data, however, it may be the only mitigation strategy available, depending on the data recipient. The framework thus recommends eliminating all direct identifiers or replacing them with values that are not linked in any way to the original information. For high-risk data, this process should be irreversible.Indirect identifiers and sensitive information that are not required for the data analysis should also be eliminated or otherwise modified to achieve an acceptable identifiability risk level. The easiest modifications that are at the same time quite effective are generalization and sampling. Generalization describes the reduction of the level of detail by enlarging numerical intervals, e.g., providing an age range rather than the age of an individual, or the combination of several categories of data into one. Sampling means leaving out certain information pertaining to some individuals, introducing uncertainty for the adversary whether a particular individual’s data are contained in the dataset.Following the mitigation efforts, it is recommended to re-evaluate the data’s identifiability as measured against the chosen threshold.

De-Identification Governance

The ISO framework suggests implementing data sharing policies and procedures into the organization’s wider information security practices. This systematizes the approach to disclosure, makes it repeatable, auditable, and enables the organization to respond most effectively to privacy incidents and breaches.Before Data are DisclosedIdentifying roles and responsibilities and training staff appropriately ensures that the required expertise exists to disclose data responsibly and in compliance with applicable privacy laws and regulations. A comprehensive record-keeping system that tracks any activities that relate to the organization’s data handling leaves an auditable trail, instilling confidence in the policies and procedures and their implementation. Open and ongoing communication with relevant stakeholders can determine what information the organisation should disclose regarding its data protection activities, without exposing information that would enable adversaries to re-identify the data more effectively.After Data are DisclosedAfter data have been disclosed, it is prudent to regularly reassess the disclosure environment, as technological abilities keep advancing, more information becomes available, and data privacy laws are established or amended regularly.In support of this, the framework advises keeping track of all the data the organization has disclosed in order to spot any potential linkage between the released datasets, new publicly available data, previous data recipients, new technologies, and developments in the law.In the event of a privacy incident, breach containment, mitigation, and reporting are paramount. Immediately after, lessons learned should be discussed and any gaps in the policies and procedures should be identified and closed. Creating an audit trail during the breach response phase is also important to demonstrate proper measures were taken and the legal obligations were fulfilled.

Conclusion

The privacy enhancing data de-identification framework - ISO/IEC 27559:2022(E) gives actionable guidelines to organizations that wish to, or are required to disclose data in their custody. The strategies and considerations informing context, data, identifiability assessment, and de-identification governance are generally applicable, yet can easily be adapted to any organization’s particular situation. The level of technicality is kept at a minimum, making the framework comprehensive for many relevant stakeholders.The technical expertise still needs to come in, of course, and help with determining the precise need for and implementing the actual de-identification. Private AI is well equipped to facilitate the categorization and de-identification of personal data, even in unstructured data and across 49 languages. Using the latest advancements in Machine Learning, the time-consuming work of redacting your data can be minimized and compliance partially automated. To see the tech in action, try our web demo, or request an API key to try it yourself on your own data.

Data Left Behind: AI Scribes’ Promises in Healthcare

Why is linguistics essential when dealing with healthcare data?

Why Health Data Strategies Fail Before They Start

Private AI to Redefine Enterprise Data Privacy and Compliance with NVIDIA

EDPB’s Pseudonymization Guideline and the Challenge of Unstructured Data

HHS’ proposed HIPAA Amendment to Strengthen Cybersecurity in Healthcare and how Private AI can Support Compliance

Japan's Health Data Anonymization Act: Enabling Large-Scale Health Research

What the International AI Safety Report 2025 has to say about Privacy Risks from General Purpose AI

Private AI 4.0: Your Data’s Potential, Protected and Unlocked

How Private AI Facilitates GDPR Compliance for AI Models: Insights from the EDPB's Latest Opinion

Navigating the New Frontier of Data Privacy: Protecting Confidential Company Information in the Age of AI

Belgium’s Data Protection Authority on the Interplay of the EU AI Act and the GDPR

Enhancing Compliance with US Privacy Regulations for the Insurance Industry Using Private AI

Navigating Compliance with Quebec’s Act Respecting Health and Social Services Information Through Private AI’s De-identification Technology

Unlocking New Levels of Accuracy in Privacy-Preserving AI with Co-Reference Resolution

Strengthened Data Protection Enforcement on the Horizon in Japan

How Private AI Can Help to Comply with Thailand's PDPA

How Private AI Can Help Financial Institutions Comply with OSFI Guidelines

The American Privacy Rights Act – The Next Generation of Privacy Laws

How Private AI Can Help with Compliance under China’s Personal Information Protection Law (PIPL)

PII Redaction for Reviews Data: Ensuring Privacy Compliance when Using Review APIs

Independent Review Certifies Private AI’s PII Identification Model as Secure and Reliable

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

News from NIST: Dioptra, AI Risk Management Framework (AI RMF) Generative AI Profile, and How PII Identification and Redaction can Support Suggested Best Practices

Handling Personal Information by Financial Institutions in Japan – The Strict Requirements of the FSA Guidelines

日本における金融機関の個人情報の取り扱い - 金融庁ガイドラインの要件

Leveraging Private AI to Meet the EDPB’s AI Audit Checklist for GDPR-Compliant AI Systems

Who is Responsible for Protecting PII?

How Private AI can help the Public Sector to Comply with the Strengthening Cyber Security and Building Trust in the Public Sector Act, 2024

A Comparison of the Approaches to Generative AI in Japan and China

Updated OECD AI Principles to keep up with novel and increased risks from general purpose and generative AI

Is Consent Required for Processing Personal Data via LLMs?

The evolving landscape of data privacy legislation in healthcare in Germany

The CIO’s and CISO’s Guide for Proactive Reporting and DLP with Private AI and Elastic

The Evolving Landscape of Health Data Protection Laws in the United States

Comparing Privacy and Safety Concerns Around Llama 2, GPT4, and Gemini

How to Safely Redact PII from Segment Events using Destination Insert Functions and Private AI API

WHO’s AI Ethics and Governance Guidance for Large Multi-Modal Models operating in the Health Sector – Data Protection Considerations

How to Protect Confidential Corporate Information in the ChatGPT Era

Unlocking the Power of Retrieval Augmented Generation with Added Privacy: A Comprehensive Guide

Leveraging ChatGPT and other AI Tools for Legal Services

Leveraging ChatGPT and other AI tools for HR

Leveraging ChatGPT in the Banking Industry

Law 25 and Data Transfers Outside of Quebec

The Colorado and Connecticut Data Privacy Acts

Unlocking Compliance with the Japanese Data Privacy Act (APPI) using Private AI

Tokenization and Its Benefits for Data Protection

Private AI Launches Cloud API to Streamline Data Privacy

Processing of Special Categories of Data in Germany

End-to-end Privacy Management

Privacy Breach Reporting Requirements under Law25

Migrating Your Privacy Workflows from Amazon Comprehend to Private AI

A Comparison of the Approaches to Generative AI in the US and EU

Benefits of AI in Healthcare and Data Sources (Part 1)

Privacy Attacks against Data and AI Models (Part 3)

Risks of Noncompliance and Challenges around Privacy-Preserving Techniques (Part 2)

Enhancing Data Lake Security: A Guide to PII Scanning in S3 buckets

The Costs of a Data Breach in the Healthcare Sector and its Privacy Compliance Implications

Navigating GDPR Compliance in the Life Cycle of LLM-Based Solutions

What’s New in Version 3.8

How to Protect Your Business from Data Leaks: Lessons from Toyota and the Department of Home Affairs

New York's Acceptable Use of AI Policy: A Focus on Privacy Obligations

Safeguarding Personal Data in Sentiment Analysis: A Guide to PII Anonymization

Changes to South Korea’s Personal Information Protection Act to Take Effect on March 15, 2024

Australia’s Plan to Regulate High-Risk AI

How Private AI can help comply with the EU AI Act

Comment la Loi 25 Impacte l'Utilisation de ChatGPT et de l'IA en Général

Endgültiger Entwurf des Gesetzes über Künstliche Intelligenz – Datenschutzpflichten der KI-Modelle mit Allgemeinem Verwendungszweck

How Law25 Impacts the Use of ChatGPT and AI in General

Is Salesforce Law25 Compliant?

Creating De-Identified Embeddings

Exciting Updates in 3.7

EU AI Act Final Draft – Obligations of General-Purpose AI Systems relating to Data Privacy

FTC Privacy Enforcement Actions Against AI Companies

The CCPA, CPRA, and California's Evolving Data Protection Landscape

HIPAA Compliance – Expert Determination Aided by Private AI

Private AI Software As a Service Agreement

EU's Review of Canada's Data Protection Adequacy: Implications for Ongoing Privacy Reform

Acceptable Use Policy

ISO/IEC 42001: A New Standard for Ethical and Responsible AI Management

Reviewing OpenAI's 31st Jan 2024 Privacy and Business Terms Updates

Comparing OpenAI vs. Azure OpenAI Services

Quebec’s Draft Regulation Respecting the Anonymization of Personal Information

Version 3.6 Release: Enhanced Streaming, Auto Model Selection, and More in Our Data Privacy Platform

Brazil's LGPD: Anonymization, Pseudonymization, and Access Requests

LGPD do Brasil: Anonimização, Pseudonimização e Solicitações de Acesso à Informação

Canada’s Principles for Responsible, Trustworthy and Privacy-Protective Generative AI Technologies and How to Comply Using Private AI

Private AI Named One of The Most Innovative RegTech Companies by RegTech100

Data Integrity, Data Security, and the New NIST Cybersecurity Framework

Safeguarding Privacy with Commercial LLMs

Cybersecurity in the Public Sector: Protecting Vital Services

Privacy Impact Assessment (PIA) Requirements under Law25

Elevate Your Experience with Version 3.5

Fine-Tuning LLMs with a Focus on Privacy

GDPR in Germany: Challenges of German Data Privacy (Part 2)

Comply with US Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence using Private AI

How to Comply with EU AI Act using PrivateGPT

Navigating the Privacy Paradox: A Guide to Ethical Fine-Tuning of Large Language Models

Adding Privacy to LangChain with Private AI