Why The Right To Be Forgotten Is Even Harder To Comply With Than You Think (And What To Do About It)

Share This Post

In today’s data-driven world, businesses are constantly collecting information from their customers in order to provide a better product or service, to understand and alleviate any pain points along their path to acquisition, to gain insights and create more efficient processes, and so much more. Data is often considered critical for modern organizations, but the collection and use of data comes with an ethical and legal responsibility. 

Global data protection regulations, like the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), exist to protect individuals’ rights to privacy. These regulations provide a legal framework which businesses must abide by when it comes to handling and storing their customers’ personally identifiable information (PII). Failure to adhere to these regulations can result in hefty fines, not to mention a severe blow to customer trust and brand reputation.  

One major requirement of several of these data protection regulations is the ability to comply with the individuals’ “right to be forgotten”, also known as the right to erasure. This means that if and when a customer requests it, organizations are legally obliged to delete all the individuals’ personal data from their systems. While this is a win for the privacy of individuals worldwide, the unfortunate reality for businesses is that properly sanitizing a single individual’s data is immensely difficult to do. In this article we discuss why erasing all of an individuals’ personal data is a technically complex task and what proactive organizations can do to make their lives easier.

Why is deleting all of an individuals’ personal data complicated?

The ability to delete all of a users’ personal data requires being able to keep track of every piece of personally identifiable information associated with them  – their name, address, phone number–but also other identifying information which, when combined together, can increase the risk of re-identifying them, like their approximate location, religion, and medical conditions. 

An individual’s data can live within numerous systems, locations, and in unexpected formats, document types, and data management technologies. Formats can vary from onboarding forms to customer service emails, customer service calls and transcripts, and to data collected through their use of the organizations’ product(s). Every time an employee shares customer data without properly tracking that data sharing process it creates another hurdle of properly tracking the information. Data management systems can be as complex to deal with from a regulatory perspective as blockchains, which present novel challenges when it comes to complying with the right to be forgotten. When you don’t set strict systems up from the beginning, you end up with situations like the Facebook’s, described here: Facebook Doesn’t Know What It Does With Your Data, Or Where It Goes: Leaked Document.

In addition, unstructured data (free text, pdfs, docx) can all contain user information that is difficult to pinpoint to a particular individual. These huge pools of mostly unknown data make up 80% and growing of an organization’s data.

Why is keeping track of all personal data just the beginning?

The right to be forgotten is more than about deleting the data that employees can see. Regulations, such as the Health Insurance Portability and Accountability Act (HIPAA) can require data to not only be deleted but also to have been sanitized.

The International Data Sanitization Consortium defines data sanitization as “the process of deliberately, permanently and irreversibly removing or destroying the data stored on a memory device to make it unrecoverable. A device that has been sanitized has no usable residual data, and even with the assistance of advanced forensic tools, the data will not ever be recovered. There are three methods to achieve data sanitization: physical destruction, cryptographic erasure and data erasure.They provide an excellent list of data protection regulations and their requirements for sanitization.

In addition to storing information for operational use, organizations are often making copies of their storage systems, which in turn also need to be sanitized of personal data. Furthermore, every time data is saved it may be stored in a new location. What this means is that the data may still reside in its deallocated location within a disk and, though now inaccessible by just regularly retrieving the data, it is still retrievable by an attacker scanning deallocated memory. Data sanitization techniques unfortunately don’t tend to be usable for selective sanitization, but rather the erasure of information of entire disks when they are repurposed or decommissioned. They cannot therefore be used to erase the information of a single user.

What can organizations do to make their lives easier?

When storing specific personal data is necessary, organizations should set processes in place as soon as possible to keep track of where that data is stored and who it is shared with, as well as ensuring that any deallocated data is properly erased. One way to do this is to replace or erase an individual’s data from a storage device A, transfer all of the remaining data stored in that device over to another storage device B, and then sanitize storage device A by applying data erasure.

That said, organizations are often collecting personal data that they do not even need. Another major requirement from data protection regulations is data minimization, which requires all personal data collected to be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed.” 

Limiting the amount of personal data collected through the process of redaction or de-identification is a central component to limiting the headache your organization will face when grappling with data protection regulation compliance.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.