The Privacy Risk of Language Models

In today’s world, large models with billions of parameters trained on terabytes of datasets have become the norm as language models are the foundations of natural language processing (NLP) applications. Several of these language models used in commercial products are also being trained on private information. An example would be Gmail’s auto-complete model. Its model … Read more

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.