What Is Sensitive Information?

a lock with coded numbers and letters

Share This Post

Data privacy is becoming an increasingly important topic in an increasingly digitized world where powerful technologies are more and more able to sift through datasets unmanageable for the human mind. Big Data analytics is not only able to detect general trends from the wealth of data available, but it can draw out valuable insights and Personally Identifiable Information (PII) about particular individuals. There is no hiding in the masses anymore. Even one lost data point can put individuals in danger of significant harm, provided it is sensitive information. In this context, it is valuable for businesses to know what information is considered sensitive and hence warrants enhanced protection.

Sensitive Information

Homeland Security defines sensitive information as information which – if lost, compromised, or disclosed without authorization – could result in substantial harm, embarrassment, inconvenience, or unfairness to an individual. This definition fits certain types of information as a matter of course, yet for other types of information, it depends on the context. 

The GDPR defines ‘sensitive data’ as follows:

– personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;
– trade-union membership;
– genetic data, biometric data processed solely to identify a human being;
– health-related data;
– data concerning a person’s sex life or sexual orientation.

The types of data that are almost always categorized as sensitive information are financial and health related, with the important exception of the GDPR with regards to financial data. Along with other personal information revealing intimate details about a person’s lifestyle and personal choices, financial and health information lie at the “biographical core” of a person, as the Supreme Court of Canada put it. Protecting them is in service of the dignity, integrity, and autonomy of the individuals to whom the personal data relates.

The UK Information Commissioner’s Office comments on the exclusion of financial data from the UK GDPR’s definition of sensitive data (which is the same as the one cited above): “[w]hilst other data may also be sensitive, such as an individual’s financial data, this does not raise the same fundamental issues and so does not constitute special category data for the purposes of the UK GDPR.”

The Values and Dangers of Sharing Sensitive Information

We are in a curious predicament today. On the one hand, having a wealth of information at our fingertips can do a lot of good. For example, knowing on a large scale how diseases are affecting people and what medications are providing the most relief with the least side effects allows for more effective healthcare, better treatment, and lower cost of medication as fewer resources are wasted pursuing suboptimal treatment methods. For disease surveillance to work well, however, a lot of personal data is required. On the other hand, sharing the health information of individuals publicly for all researchers to have access to it poses a significant risk. 

In the hands of malicious actors, health information can be used to commit identity theft to gain access to expensive medical treatments, to design convincing phishing campaigns by impersonating healthcare providers, or to blackmail patients using sensitive information about their medical conditions such as sexual health, pregnancies, abortions, etc. In addition, the misuse of health data is much harder to detect than credit card fraud, for example, and no mechanisms are in place to block cards or freeze accounts as effectively as in the financial sector. 

Financial information is sensitive not only because someone with your credit card information can purchase luxurious items at your expense, against which there thankfully exists insurance protection in many cases. Much more troublesome is the impact that identity fraud can have on a person’s credit score. If a malicious actor runs up a large debt in your name and leaves you without the funds to repay it, it can ruin you financially but also impact your ability for some years to get a mortgage, a new job, or even lodging. 

Why Context Matters for Sensitive Information

Aside from financial and health related information, seemingly innocuous data can also reveal details about us that would cause harm, embarrassment or inconvenience when revealed without authorization. Canada’s Office of the Privacy Commissioner clearly explains why context matters when determining whether personal information is sensitive or not:

Although some information (for example, medical records and income records) is almost always considered to be sensitive, any information can be sensitive, depending on the context. For example, the names and addresses of subscribers to a newsmagazine would generally not be considered sensitive information. However, the names and addresses of subscribers to some special-interest magazines might be considered sensitive.

We can imagine that someone’s ‘special interest,’ if improperly disclosed, may well cause an inconvenience, at least. This example shows that it is difficult to classify personal identifiers as either sensitive or non-sensitive without knowing the context. In the end, if nothing else, most information can be used for social engineering attacks, e.g., phishing, which are in turn used to gain access to financial or other sensitive information that can be exploited for a financial gain. For example, I would be more likely to fall for a phishing email coming from a Bell impersonator if I am their customer. Hence, my affiliation with Bell, disclosed to the wrong person, could cause me harm.

However, the reality we live in today – namely, that privacy interests need to be balanced with valid business and socially beneficial interests in access to personal data – calls for a “reasonable, pragmatic approach,” the Office of the Privacy Commissioner emphasizes. While we may be able to come up with a scenario for any data point that would render it sensitive, it would not be in the spirit of privacy legislation to classify all information as sensitive because of it. 

Conclusion

We have seen that it is advisable to be mindful of the concrete context your business is operating in when determining whether information is sensitive or not. Based on this, informed decisions must be made on where to focus enhanced protection efforts.

Private AI can help with the task of identifying what kind of personal information exists in your systems. With its ability to identify and classify more than 50 entities of Personally Identifiable Information (PII), Payment Card Industry (PCI) data, and Protected Health Information (PHI), Private AI is well equipped to help with the difficult task of achieving compliance with data privacy regulations. To see the tech in action, try our web demo, or request an API key to try it yourself on your own data.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.