Blog

Sometimes we take a break from building cutting edge AI redaction models to stretch our academic muscles and write about privacy and machine learning. Check back here regularly for our musings.

Differential privacy is a hot topic given the many conflicting opinions on its effectiveness. For some background, we previously wrote a comprehensive post on the

Over the years, large pre-trained language models like BERT and Roberta have led to significant improvements in natural language understanding (NLU) tasks. However, these pre-trained

Automated Container Resource Checks: Does your container have the required resources?

At Private AI, we are building a privacy suite centered around personally identifiable information (PII) detection and remediation in unstructured data, such as text. Users interact with our

In the previous episode of Private AI’s ML Speaker Series, Patricia Thaine (CEO of Private AI) sat down with Dr. Aida Nematzadeh (Staff Research Scientist

Discussing Responsible AI & International Governance

In the previous episode of Private AI’s ML Speaker Series, Patricia Thaine (CEO of Private AI) sat down with Dr. Sarah Shoker (Research Scientist at

In today’s world, large models with billions of parameters trained on terabytes of datasets have become the norm as language models are the foundations of

There are several resources available on the internet on how to scale your Kubernetes pods based on CPU, but when it comes to Kubernetes pods

In the previous episode of Private AI’s ML Speaker Series, Patricia Thaine (CEO of Private AI) sat down with Arvid Frydenlund (PhD candidate at the University

Personally Identifiable Information (PII) is any data that can be used to identify an individual. This can be done using direct identifiers (name, social security

Previously on Private AI’s speaker series CEO, Patricia Thaine, sat down with Franziska Boenisch to discuss her latest paper, ‘When the Curious Abandon Honesty: Federated Learning Is Not Private’.  Franziska

In the latest episode of Private AI’s ML Speaker Series, Patricia Thaine (CEO of Private AI) sits down to chat about MLOps and Machine Learning Deployment

9 Companies to Help You Get Your Privacy $hit Together

With the ever-growing number of global regulations, legislations, and amendments, it can be overwhelming to know where to start (or continue) your data privacy journey.

Transformer networks have taken the NLP world by storm, but the sheer size of these networks presents new challenges for deployment, such as how to provide acceptable latency and unit economics.

Parameter Prediction without Training and SGD

Previously on Private AI’s Speaker Series, our CEO Patricia Thaine sat down with data privacy law expert Carol Piovesan to talk about the legal ramifications

5 Facts You Probably Didn’t Know About Data Privacy

Data privacy, in simplest terms, is the right to control how your personal information is collected and used. Although this may seem obvious, it hasn’t

Data Protection Regulations to Watch Out for in 2022

Carole Piovesan discusses legal responsibilities, what companies are getting wrong with data governance, and more.

GDPR compliance, privacy and engineering team collaboration, and common mistakes companies make with their data.

Discussing developer responsibility, Bill C-11, positive consent, and the importance of Privacy by Design

“When is anonymization useful?” is a tricky question, because the answer is highly data-type- and task-dependent.

On the misleading ways journalists and industry use the term "anonymization."

Understanding key tech for data protection regulation compliance

There’s a saying ‘the last 20% of the work takes 80% of the time’ and nowhere is that more true than AI systems.

Regexes are highly effective in the perfect world of computer data, but unfortunately the real world is much more complicated.

There exists a vibrant ecosystem of specialized security tools. The sad truth is that it is almost impossible to reach 100% invulnerability. What can we do to get closer?

In the past three years there has been a massive wake-up in customer awareness about privacy. Many customers are now refactoring how they buy, taking their business elsewhere if they don’t trust a company’s data practices.

Privacy Enhancing Technologies Decision Tree: for developers, managers, and founders looking to integrate privacy into their software pipelines and products.

AI is rapidly being deployed around the world with few to follow. Along with the complexity of creating the technology, there remain many unanswered legal questions.

The new Tensorflow Lite XNNPACK delegate enables best in-class performance on x86 and ARM CPUs — over 10x faster than the default Tensorflow Lite backend in some cases.

Some techniques to improve DALI resource usage & create a completely CPU-based pipeline.

We introduce the four pillars required to achieve perfectly privacy-preserving AI and discuss various technologies that can help address each of the pillars.

We discuss a practical application of homomorphic encryption to privacy-preserving signal processing, particularly focusing on the Fourier transform.

Terms and Conditions of Use Effective March 10, 2020 These Terms and Conditions of Use (“Terms”) apply to and govern: your use of Private AI’s

We cover the basics of homomorphic encryption, followed by a brief overview of open source HE libraries and a tutorial on how to use one of those libraries (namely, PALISADE).

A number of people ask us why we should bother creating NLP tools that preserve privacy. Apparently not everyone spends hours thinking about data breaches and privacy infringements.

A very brief overview of privacy-preserving technologies follows for anyone who’s interested in starting out in this area. I cover symmetric encryption, asymmetric encryption, homomorphic encryption, differential privacy, and secure multi-party computation.

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.