Shortcuts

De-identifying Text

Once the Docker container is running, you can make requests to de-identify text. This is a POST request to the ‘deidentify_text’ route with a JSON body that has the fields described below:

POST /deidentify_text

Remove identifiers from a string or multiple strings.

Request Body

JSON object containing the following fields:

  • text: string or array(string) (mandatory)

    UTF-8 encoded message(s) to de-identify.

  • key: string (mandatory)

    License key provided to you by Private AI.

  • unique_pii_markers: bool (default: true)

    Specifies whether PII markers in the text should uniquely identify PII.

  • accuracy_mode: string (default: “standard”)

    Controls the speed/accuracy tradeoff. Possible values are “standard”, “standard_high”, “high” and “high_multilingual”.

  • enabled_classes: array(string) (defaults to all classes)

    Controls which types of PII are removed. See Supported Entity Types below for the list of possible entities.

  • marker_format: string or null (default: null)

    Specify a custom redaction marker format. The format must always contain ‘CLASS_NAME’, which will be replaced by the entity. E.g. “<<CLASS_NAME>>”, “<<–CLASS_NAME–>>”.

  • allow_list: array(string) (default: [ ])

    Any entities matching terms in this list will be discarded. Note that the match is case-insensitive. E.g. [“Maxim”, “Kandeep”].

  • fake_entity_accuracy_mode (beta): string (default: “None”)

    Enable fake entity generation using the specified model. Currently this feature is in beta and only supports accuracy_mode “standard”.

  • preserve_relationships (beta): bool (default: true)

    Specifies whether multiple instances of the same entity should have the same generated fake entity or not. For example, preserve relationships: “Hi John and Rosha, John nice to meet you” -> “Hi Harry and Alev, Harry nice to meet you”. No preserved relationships: “Hi John and Rosha, John nice to meet you” -> “Hi Harry and Alev, Sulav nice to meet you”.

Minimal example:

{
   "text": "Hello Paul, how are you?",
   "key": "<customer key>"
}

Minimal example with batched de-identification:

{
   "text": [
      "Hello Paul, how are you?",
      "My address is 123 Example Street."
   ],
   "key": "<customer key>"
}

Response Body

The API returns a JSON object containing the following fields:

  • result: string or array(string)

    The de-identified string(s).

  • result_fake (beta): string or array(string)

    The pseudonymized (fake) string(s) with each entity found replaced by a generated entity.

  • pii: array(object)

    A list of all entities found in the text. Each PII entry has the following fields:

    • marker: string

      The corresponding marker in the de-identified text (‘result’ field), where the entity exists

    • text: string

      The entity text

    • best_label: string

      The entity label with the highest likelihood

    • stt_idx: int

      Start character index of the entity, in the original text

    • end_idx: int

      End character index of the entity, in the original text

    • labels: object

      A dictionary of all possible labels, together with associated likelihoods. Note that these are not strictly probabilities and do not sum to 1, as a word can belong to multiple classes.

    • fake_text (beta): string

      The fake entity that was generated to replace the original

    • fake_stt_idx (beta): int

      Start character index of the fake entity, in the pseudonymized/fake text

    • fake_end_idx (beta): int

      End character index of the fake entity, in the pseudonymized/fake text

  • api_calls_used: int

    The number of API calls used to process a request.

  • output_checks_passed: bool

    Reports whether the output validity checks passed or not. These checks test whether:

    1. replacing each entity marker with the corresponding information matches the input

    2. every entity marker is bounded by whitespace or punctuation

Sample Commands

Below are some sample commands and corresponding outputs displaying the different options.

Unique PII markers (default)
$ curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "My name is John and my friend is Grace", "key": "<customer key>"}'
{
  "result": "My name is [NAME_1] and my friend is [NAME_2]",
  "pii": [
     {
        "marker": "NAME_1",
        "text": "John",
        "best_label": "NAME",
        "stt_idx": 11,
        "end_idx": 15,
        "labels": {"NAME": 0.923}
     },
     {
        "marker": "NAME_2",
        "text": "Grace",
        "best_label": "NAME",
        "stt_idx": 33,
        "end_idx": 38,
        "labels": {"NAME": 0.9135}
     }
  ],
  "api_calls_used": 1,
  "output_checks_passed": true
}
Non-unique PII markers
$ curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "My name is John and my friend is Grace", "unique_pii_markers": "False", "key": "<customer key>"}'
{
  "result": "My name is [NAME] and my friend is [NAME]",
  "pii": [
     {
        "marker": "NAME",
        "text": "John",
        "best_label": "NAME",
        "stt_idx": 11,
        "end_idx": 15,
        "labels": {"NAME": 0.923}
     },
     {
        "marker": "NAME",
        "text": "Grace",
        "best_label": "NAME",
        "stt_idx": 33,
        "end_idx": 38,
        "labels": {"NAME": 0.9135}
     }
  ],
  "api_calls_used": 1,
  "output_checks_passed": true
}
Enabled classes
$ curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "My name is John and my friend is Grace and we live in Barcelona", "key": "<customer key>", "enabled_classes": ["AGE", "LOCATION"]}'
{
  "result": "My name is John and my friend is Grace and we live in [LOCATION_1]",
  "pii": [
     {
        "marker": "LOCATION_1",
        "text": "Barcelona",
        "best_label": "LOCATION",
        "stt_idx": 54,
        "end_idx": 63,
        "labels": {"LOCATION": 0.9211}
     }
  ],
  "api_calls_used": 1,
  "output_checks_passed": true
}
Batching
 $ curl -X POST http://localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": ["My password is: 4XDX63F8O1", "My password is: 33LMVLLDHNasdfsda"], "key": "INTERNAL_TESTING_DEMO_REALLY"}'
 [
   {
      "result":"My password is: [PASSWORD_1]",
      "result_fake":null,
      "pii":[
         {
            "marker":"PASSWORD_1",
            "text":"4XDX63F8O1",
            "best_label":"PASSWORD",
            "stt_idx":16,
            "end_idx":26,
            "labels":{"PASSWORD":0.9346}
         }
      ],
      "api_calls_used":1,
      "output_checks_passed":true
   },
   {
      "result":"My password is: [PASSWORD_1]",
      "result_fake":null,
      "pii":[
         {
            "marker":"PASSWORD_1",
            "text":"33LMVLLDHNasdfsda",
            "best_label":"PASSWORD",
            "stt_idx":16,
            "end_idx":33,
            "labels":{"PASSWORD":0.9312}
         }
      ],
      "api_calls_used":1,
      "output_checks_passed":true
   }
]
Private AI Demo Server
$ curl -X POST https://demoprivateai.com -H 'content-type: application/json' -d '{"text": "My name is John and my friend is Grace", "key": "<customer key>"}'
{
  "result": "My name is [NAME_1] and my friend is [NAME_2]",
  "pii": [
     {
        "marker": "NAME_1",
        "text": "John",
        "best_label": "NAME",
        "stt_idx": 11,
        "end_idx": 15,
        "labels": {"NAME": 0.923}
     },
     {
        "marker": "NAME_2",
        "text": "Grace",
        "best_label": "NAME",
        "stt_idx": 33,
        "end_idx": 38,
        "labels": {"NAME": 0.9135}
     }
  ],
  "api_calls_used": 1,
  "output_checks_passed": true
}
Fake entity generation
$ curl -X POST localhost:8080/deidentify_text -H 'content-type: application/json' -d '{"text": "My name is John and my friend is Grace and we live in Barcelona", "key": "<customer key>", "fake_entity_accuracy_mode": "standard"}'
{
  "result": "My name is [NAME_1] and my friend is [NAME_2] and we live in [LOCATION_1]",
  "result_fake": "My name is Sarah and my friend is Sarah and we live in California",
  "pii": [
     {
        "marker": "NAME_1",
        "text": "John",
        "best_label": "NAME",
        "stt_idx": 11,
        "end_idx": 15,
        "labels": {"NAME":0.9061},
        "fake_text": ["Sarah"],
        "fake_stt_idx": 11,
        "fake_end_idx": 16
     },
     {
        "marker": "NAME_2",
        "text": "Grace",
        "best_label": "NAME",
        "stt_idx": 33,
        "end_idx": 38,
        "labels": {"NAME": 0.9032},
        "fake_text": ["Sarah"],
        "fake_stt_idx": 34,
        "fake_end_idx": 39
     },
     {
        "marker": "LOCATION_1",
        "text": "Barcelona",
        "best_label": "LOCATION",
        "stt_idx": 54,
        "end_idx": 63,
        "labels": {"LOCATION": 0.8985},
        "fake_text": ["California"],
        "fake_stt_idx": 55,
        "fake_end_idx": 65
     }
  ],
  "api_calls_used": 1,
  "output_checks_passed": true
}