Migrating Your Privacy Workflows from Amazon Comprehend to Private AI

AWS Comprehend

Share This Post

At Private AI, we pride ourselves on making scalable solutions for large companies, small businesses, and government institutions that need to securely handle the potentially sensitive data and information that is flowing through their infrastructure. We understand the importance of maintaining data privacy and meeting regulatory compliance standards, and our goal is to ensure that migrating to our platform will be a smooth and straightforward process.

This guide will cover the following topics:

  • – Purpose and Intended Audience
  • – Prerequisites
  • – Useful Links
  • – Terms Used in This Guide
  • – Why Migrate From Amazon AWS Comprehend to Private AI?
  • – Examining a Real World Scenario: Customer Service Chat Messages 
  • – Identifying Sensitive Data with AWS Comprehend
  • – Identifying Sensitive Data with the Private AI Process Text Service
  • – Locating Sensitive Data with AWS Comprehend
  • – Locating Sensitive Data with the Private AI Process Text Service
  • – Redacting Sensitive Data with AWS Comprehend
  • – Redacting Sensitive Data with the Private AI Process Text Service
  • – Conclusion

Purpose and Intended Audience

The purpose of this guide is for managers, CIOs, and CISOs to understand the benefits of using Private AI instead of AWS Comprehend. We have various features and capabilities which set us apart from other solutions and is the reason why Private AI has been recognized by Gartner as a Cool Vendor in their “Cool Vendors in Privacy, 2023” report. 

This guide will also show application developers who are responsible for maintaining applications and services that have already been deployed using Amazon AWS Comprehend. This guide will show you all the steps necessary to migrate your existing workflows to Private AI for privacy, regulatory compliance, and smart redaction of your potentially sensitive data while maintaining the integrity and utility of your information. 

Prerequisites

In order to fully complete all the steps provided in this guide, you need to have the following:

  • Knowledge of an already existing application or service that utilizes AWS Comprehend
  • A free account on the Private AI Portal
  • A free Private AI API key (login to the portal to create one)
  • Access to the Private AI Cloud API (or) completed the steps of the Installation Guide

Useful Links

In this guide we’re going to cover the steps necessary to migrate your workflows from AWS Comprehend to Private AI. Therefore, the following resource will be useful for anyone following the steps in this guide:

The Private AI Process Text API documentation

Terms Used in This Guide

Below is a list of terms that will be used in this guide.

Why Migrate from Amazon AWS Comprehend to Private AI?

For compliance with various regulations like HIPAA, PCI DSS, and GDPR, having robust tools for data privacy and security is non-negotiable. Although AWS Comprehend offers basic capabilities for detecting and redacting PII, below is a short list of features of why customers are migrating their privacy workflows to Private AI.

— Private AI supports both on-prem and cloud deployments. Simply stated, this means that Private AI gives you the option to keep your data totally within your network infrastructure. This greatly enhances your control over data security and privacy, ensuring that sensitive information remains within your secure environment. However, if you are currently using cloud providers to host your infrastructure (such as Amazon AWS, Google, or others), you can easily migrate your privacy workflows to Private AI with a few lines of code. 

— Private AI can accomplish in a single HTTP request what AWS Comprehend needs to do in 3 separate requests. Using Private AI you can label, locate, and redact sensitive information in a single HTTP request. Period. Using AWS Comprehend, all of these operations are 3 different HTTP requests. This equates to more CPU time, network bandwidth, and expense spent on their services compared to Private AI’s platform.

— Private AI doesn’t have rights to your data. Your data is your data. Private AI doesn’t store, access, or use your data for any other purpose than to process your sensitive information. This is by-default with our on-prem solution, but also applies to our cloud service as well.

— Private AI can process your data in over 50 languages. The world is a big place and PII can be hidden in other languages besides English. Private AI’s solution supports over 50 languages and enables you to support regulatory compliance requirements in multiple global regions. See the full list of supported languages here.

— Private AI supports over 50 entity types. Among the languages that are supported, Private AI can identify over 50 entity types of PII. This includes names, addresses, credit card data, and much more. See the full list of supported entity types here.

— Private AI doesn’t try to “vendor lock-in” you to other services that you don’t need. Other services require you to pay for cloud storage in order to process your sensitive information. Private AI provides you the ability to deploy a Docker container within your own infrastructure, therefore eliminating unnecessary monthly storage costs and paying for services that you don’t need.

— Private AI can process data within text and binary file formats such as PDF, Word, and Excel. If you send Private AI a supported file format, then our API can send you the same file type back. This greatly reduces the complexity of integrating our services into your existing workflows and saves you time in processing document formats. See the full list of supported document types here.

— Private AI can process data within images. We can identify PII as text in an image. This means that even visual data is not beyond the scope of privacy protection, extending the capabilities of PII detection and redaction to a broader range of media formats. See the full list of supported image types here.

— Private AI can process data within audio files. If you have customer support calls with PII, then Private AI can analyze and redact sensitive information, ensuring that audio recordings comply with privacy regulations while maintaining the quality and utility of the data for customer service and analysis. See the full list of supported audio file types here.

Migrating From AWS Comprehend to Private AI

Now that we’ve seen a list of the major benefits of Private AI compared to other services, let’s use a real world example of PII in order to show how to migrate the three following capabilities of AWS Comprehend to Private AI:

  • – Identifying Sensitive Data
    – Locating Sensitive Data
    – Redacting Sensitive Data

Examining a Real-World Scenario: Customer Service Chat Messages

Listing 1 below contains an excerpt of a fictitious interaction of a customer with a bank representative over a text chat window. Paulo, the customer, is chatting with a customer service representative to inquire about his recent credit card statement. In this situation, the customer service agent typed the following response.

Listing 1. An Example of Sensitive Information Sent Over a Chat Window

Now let’s compare and contrast how the same input is processed using AWS Comprehend versus Private AI.

Identifying Sensitive Data with AWS Comprehend

Using AWS Comprehend with the example text in Listing 1 above, you will be provided with a JSON response as shown in Listing 2 below.

				
					{
    "Labels": [
        {
            "Name": "NAME",
            "Score": 0.9149109721183777
        },
        {
            "Name": "CREDIT_DEBIT_NUMBER",
            "Score": 0.5698626637458801
        };’[
         {
            "Name": "ADDRESS",
            "Score": 0.9951046109199524
        }
    ]
}

				
			

Listing 2. The AWS Comprehend JSON Result of Identifying Sensitive Data From Listing 1 

With AWS Comprehend you are provided with the PII Entity type and a score that gives you its estimation of the accuracy of the identifying process for each entity type. Now let’s see how to process the same text in Listing 1 above using Private AI.

Identifying Sensitive Data with the Private AI Process Text Service

The Private AI Process Text Service provides 3 capabilities in a single call to the service: identifying, locating, and smart redaction. For more information about how to invoke the Process Text Service (including all the options and parameters available), please refer to the Process Text API documentation.

Listing 3 below is the cURL command necessary to invoke the Process Text Service.

				
					curl -i -X POST \
--location 'https://api.private-ai.com/deid/v3/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: $API-KEY' \
--data '{
    "text": [
        "Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109."
    ]
}'

				
			

Listing 3. The cURL Command to Invoke the Process Text Service

As you can see, all you need to do is to provide a valid API key as well as the text that you want to be labeled. Additional parameters, such as the link_batch or the entity_detection can be specified as optional parameters in the request.

After executing the request to the service, the JSON response is shown in Listing 4 below.

				
					[
    {
        "processed_text": "Hello [NAME_1]. The latest statement for your credit card account [CREDIT_CARD_1] was mailed to [LOCATION_ADDRESS_1].",
        "entities": [
            {
                "processed_text": "NAME_1",
                "text": "Paulo Santos",
                "location": {
                    "stt_idx": 6,
                    "end_idx": 18,
                    "stt_idx_processed": 6,
                    "end_idx_processed": 14
                },
                "best_label": "NAME",
                "labels": {
                    "NAME": 0.9226,
                    "NAME_GIVEN": 0.4508,
                    "NAME_FAMILY": 0.4557
                }
            },
            {
                "processed_text": "CREDIT_CARD_1",
                "text": "1111-0000-1111-0000",
                "location": {
                    "stt_idx": 70,
                    "end_idx": 89,
                    "stt_idx_processed": 66,
                    "end_idx_processed": 81
                },
                "best_label": "CREDIT_CARD",
                "labels": {
                    "CREDIT_CARD": 0.9176
                }
            },
            {
                "processed_text": "LOCATION_ADDRESS_1",
                "text": "123 Any Street, Seattle, WA 98109",
                "location": {
                    "stt_idx": 104,
                    "end_idx": 137,
                    "stt_idx_processed": 96,
                    "end_idx_processed": 116
                },
                "best_label": "LOCATION_ADDRESS",
                "labels": {
                    "LOCATION_ADDRESS": 0.9415,
                    "LOCATION_ADDRESS_STREET": 0.309,
                    "LOCATION": 0.9024,
                    "LOCATION_CITY": 0.1033,
                    "LOCATION_STATE": 0.1048,
                    "LOCATION_ZIP": 0.211
                }
            }
        ],
        "entities_present": true,
        "characters_processed": 138,
        "languages_detected": {
            "en": 0.8629347681999207
        }
    }
]

				
			

Listing 4. The Private AI Process Text JSON Result of the Sensitive Data From Listing 1

Let’s delve deeper and analyze Listing 4 above. The Private AI Process Text Service performs identifying, locating, and smart redaction in a single JSON response.

The entities Array will contain all PII entities found in the original text. Please note that the entities[].best_label property will contain the PII Label such as “NAME”, “LOCATION_ADDRESS, or “CREDIT_CARD”.

Upon further analysis of the results, you can see that the  entities[].labels property not only provides a number regarding the accuracy of the identifying process, it also provides more details about the PII entity itself. Specifically, in this case:

				
					  "text": "Paulo Santos",
...
                "best_label": "NAME",
                "labels": {
                    "NAME": 0.9226,
                    "NAME_GIVEN": 0.4508,
                    "NAME_FAMILY": 0.4557
                }

				
			

The Private AI Process Text Service identifies the text, “Paulo Santos” as a first name and a last name.

Now let’s compare and contrast how to locate sensitive information with AWS Comprehend vs Private AI.

Locating Sensitive Data Entities with AWS Comprehend

Let’s use the same example scenario of Paulo Santo interacting with a customer service agent shown in Listing 1, above. Listing 5 below shows the JSON result when using AWS Comprehend to locate sensitive information.

				
					
{
    "Entities": [
        {
            "Score": 0.9999669790267944,
            "Type": "NAME",
            "BeginOffset": 6,
            "EndOffset": 18
        },
        {
            "Score": 0.8905550241470337,
            "Type": "CREDIT_DEBIT_NUMBER",
            "BeginOffset": 69,
            "EndOffset": 88
        },
        {
            "Score": 0.9999889731407166,
            "Type": "ADDRESS",
            "BeginOffset": 103,
            "EndOffset": 138
        }
    ]
}

				
			

Listing 5. The AWS Comprehend JSON Result of Locating Sensitive Data From Listing 1

Within the JSON response, the Entities Array contains properties such as Type, BeginOffset, and EndOffset to show you where to locate within your original text the sensitive information.

Locating Sensitive Data with the Private AI Process Text Service

As previously stated, the Private AI Process Text Service provides identifying, locating, and smart redaction as a single HTTP request. For more information about how to invoke the Process Text Service (including all the options and parameters available), please refer to the Process Text API documentation.

Refer to Listing 3 above for the cURL command necessary to invoke the Process Text Service, as well as Listing 4 for the full JSON response from the service.

Again, the entities Array will contain all PII entities found in the original text. Please note that the entities[].location Array will contain the starting and the ending indices of the PII information in stt_idx and end_idx properties, respectively. 

				
					 "entities": [
            {
                "processed_text": "NAME_1",
                "text": "Paulo Santos",
                "location": {
                    "stt_idx": 6,
                    "end_idx": 18,
                    "stt_idx_processed": 6,
                    "end_idx_processed": 14
                }
		...

				
			

So, in our example the name, “Paulo Santos”, starts at position 6 and ends at position 18 in the original text. As you can see from the code snippet above, we also provide the starting and ending indices of the anonymized text as elements in the Array. 

Now let’s wrap things up and compare and contrast how to redact sensitive information with AWS Comprehend vs Private AI.

Redacting Sensitive Data with AWS Comprehend

Again, we’re going to use the same example scenario of Paulo Santo interacting with a customer service agent shown in Listing 1, above. Listing 6 below shows the JSON result when using AWS Comprehend to redact sensitive information.

				
					{
Hello ***** ******. The latest statement for your credit card account ******************* was mailed to *** *** ******* ******** ** *****
 }

				
			

Listing 6. The AWS Comprehend JSON Result of Redacting Sensitive Data From Listing 1

Redacting Sensitive Data with the Private AI Process Text Service 

Let’s now turn our attention to redacting using the Private AI Process Text Service. Refer to Listing 3 above for the cURL command necessary to invoke the Process Text Service, as well as Listing 4 for the full JSON response from the service.

Now in this particular case, we’re only interested in knowing what text was redacted, which can easily be found in the processed_text property of the JSON result, as shown in the code snippet below

				
					{
        "processed_text": "Hello [NAME_1]. The latest statement for your credit card account [CREDIT_CARD_1] was mailed to [LOCATION_ADDRESS_1].",
    ...
}

				
			

Sanitizing Data, Without Sterilizing the Valuable Information

As you can see from the response above, the Private AI Process Text Service not only fully redacted the sensitive information, but also provided the types of the PII in the response. 

Our offering allows you to sanitize any data that is flowing through your infrastructure, without the need to completely sterilize the information which you can use at a later stage for analytical purposes.

Redacting with Masking the Sensitive Information

We also provide our customers the option to redact sensitive information by completely masking any PII by adding the “type”: “MASK”, parameter and the specific mask character that you prefer. In this example, we’re going to specify that we want “#” to replace any PII, therefore we will also add, “mask_character”: “#”  to the request. The updated cURL command for the Process Text Service is shown in Listing 7, below.

				
					curl -i -X POST \
--location 'https://api.private-ai.com/deid/v3/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: $API-KEY' \
--data '{
    "text": [
        "Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109."
    ],
    "processed_text": {
      "type": "MASK",
      "mask_character": "#"
    }
}’


				
			

Listing 7. The Updated cURL Command to Invoke the Process Text Service to Specify a Mask Character

After invoking the Process Text Service, Listing 8 below shows all PII replaced with the desired mask character.

				
					[
    {
        "processed_text": "Hello ############. The latest statement for your credit card account ################### was mailed to #################################.",
        "entities": [
            {
                "processed_text": "############",
                "text": "Paulo Santos",
                "location": {
                    "stt_idx": 6,
                    "end_idx": 18,
                    "stt_idx_processed": 6,
                    "end_idx_processed": 18
                },
                "best_label": "NAME",
                "labels": {
                    "NAME": 0.9226,
                    "NAME_GIVEN": 0.4508,
                    "NAME_FAMILY": 0.4557
                }
            },
            {
                "processed_text": "###################",
                "text": "1111-0000-1111-0000",
                "location": {
                    "stt_idx": 70,
                    "end_idx": 89,
                    "stt_idx_processed": 70,
                    "end_idx_processed": 89
                },
                "best_label": "CREDIT_CARD",
                "labels": {
                    "CREDIT_CARD": 0.9176
                }
            },
            {
                "processed_text": "#################################",
                "text": "123 Any Street, Seattle, WA 98109",
                "location": {
                    "stt_idx": 104,
                    "end_idx": 137,
                    "stt_idx_processed": 104,
                    "end_idx_processed": 137
                },
                "best_label": "LOCATION_ADDRESS",
                "labels": {
                    "LOCATION_ADDRESS": 0.9415,
                    "LOCATION_ADDRESS_STREET": 0.309,
                    "LOCATION": 0.9024,
                    "LOCATION_CITY": 0.1033,
                    "LOCATION_STATE": 0.1048,
                    "LOCATION_ZIP": 0.211
                }
            }
        ],
        "entities_present": true,
        "characters_processed": 138,
        "languages_detected": {
            "en": 0.8629347681999207
        }
    }
]

				
			

Listing 8. The Private AI Process Text JSON Result of the Sensitive Data From Listing 1 Using a Character Mask

Conclusion

The Private AI Process Text Service is a straightforward and easy to use service that’s versatile to provide multiple privacy protecting features in a single HTTP request (or API call). If you already have privacy workflows that utilize AWS Comprehend, this guide has shown you how to simplify your development effort when processing sensitive data, all while keeping the value of the information itself. With support for over 50 languages and on-prem deployments, we offer a robust and scalable solution that makes it easy for your organization to comply with global privacy regulations. 

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.