Train your ML Models Without Compromising Privacy

The Problem:

Using Production Data to Train ML Models Puts Customer Data at Risk

Machine Learning is all about the data and any ML model is only as good as the data it’s trained on, hence the voracious need for production data.

Unfortunately, using production data to train chatbots or other ML projects is frowned upon by data protection regulators as it can end up exposing users’ personally identifiable information (PII) to a broad audience, as this Korean lovebot started doing, or it can even create murderous toasters. 

Enter Private AI:

Preventing Downstream Accuracy Loss with Synthetic PII Generation

Private AI can generate synthetic PII that fits the context of the surrounding text. Taking production data and replacing all PII with contextually relevant synthetic data is an excellent way to get the data needed to train your models without compromising the privacy of all the user data within those datasets.

And it’s highly secure 

In the event of a breach, it’s nearly impossible to distinguish synthetic PII from real PII, so the risk of identifying any accidentally-exposed PII is minimal. Additionally, the ML-powered PII generator never sees the original PII, providing a simple privacy guarantee without a lot of math.


Designed for Developers

Our system is packaged in a single Docker container and is deployed in your systems with just a few lines of code so you can quickly add privacy protection to your data pipeline. Read more about installation in our docs.

Private AI plugs seamlessly into your existing infrastructure.

Why Private AI

Unrivalled Accuracy

Private AI uses the latest advancements in machine learning to achieve remarkable accuracy out of the box. See how we stack up against our competitors in our technical whitepaper

Private AI
Major Cloud Provider 2
Open Source Software 2
Open Source Software 1
Major Cloud Provider 1
Major Cloud Provider 3
0.80 0.90 1

Try it yourself on your own data:

From all of the PII redaction products we’ve seen out there (and believe me, we’ve seen all of them), Private AI is the best one by far in terms of accuracy, types of data that can be redacted, and flexibility of their models. After doing a side by side comparison it quickly became clear to us that we couldn’t go back to using something like AWS Comprehend.

Sebastian Jimenez
Founder, Rilla Voice

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.


Tested on a dataset composed of messy conversational data containing sensitive health information. Download our whitepaper for further details, as well as how we perform on precision and F1-score or contact us to get a copy of the evaluation code.


Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.