The realm of generative AI, encompassing technologies that generate content like text, images, and videos, have seen a significant surge in usage and development. In response, the Office of the Privacy Commissioner of Canada (OPC) has introduced key privacy principles tailored to generative AI technologies on December 7, 2023. This framework is crucial for organizations navigating the complex intersection of AI innovation and data privacy. Here, we explore the significance of these principles for generative AI development and use and the pivotal role of Private AI’s technology in ensuring compliance with a broad subset of these principles.
1. Legal Authority and Consent
The Principle: Legal authority for collecting, using, or disclosing personal data for use in generative AI systems, whether that is training, development, deployment, operation, or decommissioning of a generative AI system, is as crucial as it is for any other use case. The OPC emphasises that this principle also applies when data containing personal information is obtained from third parties or when an inference is drawn from personal information, as drawing inferences is considered collection for which legal authority is required. Oftentimes, the legal authority required will be consent. It may not be easy to obtain valid consent, the first hurdle being that training data scraped from the internet may contain an unmanageable amount of personal data.
Private AI’s Role: While Private AI’s technology does not directly handle consent, it can aid organizations in minimizing the scope of personal data used, reducing the breadth of consent required. Private AI can detect, redact, or replace personal information with synthetic data. It does so in 52 languages, various different file types, and with unparalleled accuracy.
2. Limiting Collection, Use, and Disclosure to what is Necessary
The Principle: Data collection, use and disclosure must be restricted to what is necessary for the AI model’s training and operation. Unnecessary data collection can lead to privacy risks and regulatory non-compliance. The OPC proposes the use of anonymized, synthetic, or de-identified data rather than personal information where the latter is not required to fulfill the identified appropriate purpose(s). The OPC further reminds developers of AI systems that personal information available on the internet is not outside of the purview of applicable privacy laws.
Private AI’s Advantage: Private AI’s technology aids in ensuring that only essential data is utilized for AI training and operations, helping to render personal information anonymized or de-identified, or by creating synthetic data in place of personal information. For the use of AI systems, organizations are well advised to use Private AI’s PrivateGPT, which ensures that user prompts are sanitized – i.e., personal information is filtered out from the prompts – before it is sent to the AI system. Depending on the use case, the personal information to be excluded from the prompt can be selected on a very granular level to ensure the prompt’s utility. Before the system’s answer is sent back to the user, the personal information is automatically inserted into the output, without ever being disclosed to the model.
The Principle: The openness principle is very broad and asks for transparency by developers, providers, and organizations using AI systems regarding collection, use, and disclosure of personal information, associated risks and their mitigation, the training data set(s), whether an AI system will be used as part of a decision-making process, and more.
Private AI’s Role: Private AI can help with one aspect of compliance with the openness principle: To be open about the use of personal information firstly requires knowing what personal information is being used, if using it is indeed necessary for the use case. Given the enormous amount of data AI models are usually trained on, this is not an easy ask. Private AI’s algorithm can detect personal information at scale and make otherwise overwhelmingly large data sets reportable.
The Principle: Accountability is a big topic that includes, among other aspects, an internal governance structure and clearly defined roles and responsibilities. It also importantly requires explainability of the model’s output. The OPC advises that one aspect of achieving this is conducting Privacy Impact Assessments (PIA) and/or Algorithmic Impact Assessments (AIA) to identify and mitigate against privacy and other fundamental rights impacts.
Private AI’s Role: For PIAs and AIAs it is again crucial to know what the model has been trained or fine-tuned on, as these models are all about data, and lots of it. Private AI can help with this onerous task.
5. Access and Correction
The Principle: The principle of individual access necessitates that users can access their personal data used by generative AI systems. They also have a right to ask for their information to be corrected, especially if the information is included in the model’s output. Both of these requirements pose a particular challenge in the context of generative AI. These models do not in any straightforward way store the source they are trained on, so that it could be retrieved like from a folder. However, the encoded training data may be spewed out in production. Removing incorrect information from AI systems can therefore mean that the model has to be retrained, which is expensive and time consuming.
Private AI’s Role: Private AI’s technology can rapidly identify and categorize personal data within AI training data, whether this is pre-training data or stored user prompts, aiding organizations in efficiently responding to user access or correction requests. There are limits to this, though. The technology can only help identify what went into the model – how to get it out or corrected remains a challenge, which is another great reason for ensuring that as little personal information is contained in training data as possible.
6. Data Safeguards in Generative AI Operations
The Principle: Implementing safeguards for personal information is essential in generative AI, particularly given the vast amount of data these systems can process and store and the pervasive proliferation and impact these tools are expected to have on society.
Private AI’s Contribution: Aside from the previously discussed help with anonymization and pseudonymization, Private AI’s tools can also aid in mitigating bias. When sensitive identifiers such as gender, race, origin, and religion are removed from the model’s training data, the generated output of the model is necessarily more neutral and less likely to be biased or discriminatory.
Conclusion: Private AI as an Ally in Generative AI Privacy Compliance
In the rapidly evolving field of generative AI, adhering to Canadian privacy principles is a complex but critical endeavor. Technologies like Private AI’s detection and redaction products play a crucial role in this landscape, offering tools for anonymization, pseudonymization, and synthetic data generation that can help protect privacy and reduce bias in model outputs. While challenges like retraining AI models and preventing data extraction persist, leveraging Private AI’s solutions is a substantial step towards responsible, trustworthy, and privacy-compliant AI development and usage in Canada. Try it on your own data using our web demo or sign up to get a free API key.