With OpenAI’s and Azure OpenAI’s API offerings, businesses are enabled to develop their own applications on top of the powerful large language models (LLMs) underlying ChatGPT, Whisper, and OpenAI’s other models. Close to a thousand commercial applications already exist that make use of OpenAI’s foundation models, including our own PrivateGPT which acts as the privacy layer for ChatGPT by removing any personally identifiable information from prompts before they get shared with the third party.
With the incredible boost for innovative capabilities come certain drawbacks as well. If OpenAI gets things wrong in their development of the LLMs, the applications built upon the foundation model will likely suffer from the same flaws. This is an issue for responsible AI development in general, but we will only focus on data privacy compliance in this blog post.
OpenAI’s Data Protection in General
This language is not particularly comforting if your business is offering services built on ChatGPT outside of these jurisdictions, e.g., in Canada or the UK. OpenAI also states that the DPA cannot be customized on a case-by-case basis, hence it may be advisable to obtain legal counsel to assess whether your or your customers’ data would be appropriately protected under the DPA or whether an individual agreement with OpenAI may be necessary to ensure data protection in accordance with the data privacy laws applicable to your organization.
OpenAI’s SOC2 Type 2 certification and the fact that it has been audited against the 2017 Trust Services Criteria for Security by an independent auditor may provide some assurance as to the company’s data protection practices. However, prudent business practice would advise a review of these audit reports.
Location of Data
OpenAI represents that all customer data is processed and stored in the US. No data centres are located in the EU or elsewhere, and no capability currently exists to self-host.
Disclosure of Customer Data
Subservice providers used by OpenAI are, at the time of writing, Microsoft for providing cloud infrastructure; OpenAI affiliates for services and support; Snowflake for data warehousing; and TaskUS for user support and human annotation of data for service improvement. Microsoft and TaskUS are located ‘worldwide’ and the other two are in the US. In light of the fact that OpenAI says all customer data is processed and stored in the US, it seems that the Microsoft servers hosting OpenAI customer data are located in the US as well. It may be possible, however, to arrange storage in a jurisdiction more appropriate for an organization’s needs as Microsoft Azure’s data centres are located in many different geographies.
For purposes of European businesses wishing to build an application based on OpenAI’s models, it makes little difference in which US state the data is stored, as no US state has currently received an adequacy decision by the EU Commission which would allow the data to be transferred across borders without further due diligence. In order to be able to still transfer personal data to the US, Art. 46 of the GDPR requires, among some other alternatives, that prescriptive contractual clauses be included in an agreement with OpenAI to ensure the protection of EU citizens’ data. Whether the DPA meets the GDPR’s standards may need to be assessed by legal counsel.
For businesses intending to use ChatGPT as the basis for their own application which will process personal information of Quebec residents (a province in Canada), the data location information provided by OpenAI is insufficient. Starting September 22, 2023, the data protection law of Quebec requires a privacy impact assessment to be conducted before any personal information is disclosed outside of Quebec, which includes a jurisdiction-specific assessment of the adherence to recognized privacy principles. Hence, an inquiry into the location of the data warehouse where the information will be stored is necessary. Similarly, starting September 1, 2023, businesses to which the new Swiss data protection law applies are required to disclose to individuals the country to which their personal information is transferred as well as certain guarantees in place that ensure adequate data protection.
According to OpenAI’s FAQ as well as the DPA, data in transmission via the API is encrypted. Transport Layer Security (TLS) ensures that the data cannot be altered or viewed by third parties during the transfer.
OpenAI retains data transmitted by API for 30 days at which point it is either deleted or aggregated or stored in a manner that does not identify individuals or customers.
Data Deletion Requests
OpenAI is committed to responding to reasonable deletion requests by their customers and will delete customer data upon termination of the business relationship. If a deletion request is made by an individual directly to them, OpenAI will contact the business promptly.
Personal Information Used for Product Improvement
The use of identifiable personal information for product improvement purposes will only take place upon explicit opt-in by the business using OpenAI’s API services. However, in aggregated or de-identified form, the data can be retained for longer than 30 days and be used for the purposes of improving OpenAI’s systems and services.
The US Health Insurance Portability and Accountability Act (HIPAA) requires health service providers to enter into so-called “business associate contracts” with entities that perform functions or activities on behalf of, or provide certain services to them that involve access to protected health information (PHI). This contract ensures that PHI is properly safeguarded, that the data is used for permissible purposes only, and that it is only disclosed further as required under the contract or the law.
OpenAI indicates that they are able to enter into business associate contracts if required. Hence, compliance with HIPAA seems to be achievable, allowing businesses to build applications supporting healthcare services based on ChatGPT.
OpenAI has taken important steps towards compliance with data privacy laws. Several of the measures taken were enhancements in reaction to pressure applied by privacy regulators, for example the temporary ban of ChatGPT in Italy. Such measures are the opt-in approach to using data for product improvement purposes as well as limited data retention. Businesses building applications on top of models such as ChatGPT rely on OpenAI’s proper data handling processes. But blind trust in this regard is misplaced. The necessary due diligence should not be skipped in the race to build the next smashing ChatGPT application.
The most foolproof measure to protect your customers’ personal information is to not transmit it to OpenAI in the first place. Private AI’s PrivateGPT filters out more than 50 entity types including PHI and Payment Card Industry (PCI) data from your prompt before it is sent to ChatGPT. The generated response is then repopulated with the original personal information before it is displayed to you. This seamless process allows your business to safely use the benefits of OpenAI’s models for your application while maintaining the trust of your customers who may otherwise be skeptical about the proper protection of their personal information. Try PrivateGPT today.