The CIO’s and CISO’s Guide for Proactive Reporting and DLP with Private AI and Elastic

Private AI
Jun 5, 2024
Share this post
Sharing to FacebookSharing to LinkedInSharing to XSharing to Email

Being able to manage the data and information within a company’s infrastructure is critical for properly assessing when sensitive information is either being mismanaged or to report an “all clear” when company policies are being followed as intended.

As you may be already aware, Private AI provides PII detection and redaction services to enable companies to protect their sensitive information. However, when leveraged with the ELK Stack (a suite of tools tailored for searching, managing, analyzing, and visualizing data within the corporate infrastructure) we provide CIOs and CISOs with the ability to have complete situal awareness of your company’s infrastructure in regards to company policy compliance and security practices.Here is a list of topics covered in this guide:

  • Purposes and Intended Audience
  • Useful Links
  • Terms Used in This Guide
  • Quick Refresher: What is Elastic and the “ELK Stack”?
  • Adequate DLP Does Not Need to Be Overly Complicated
  • Example Scenarios Where You Need PII Detection/Redaction with Powerful Indexing
    • Scenario #1 - Company A: Full Credit Card Data is Stored Within a Spreadsheet
    • Scenario #2 - Company B: Sensitive Information Being Shared in Customer Support Chat Windows is Not Being Redacted
  • Conclusion

Purposes and Intended Audience

The purpose of this guide is for managers, CIOs, and CISOs to understand the benefits of using Private AI with Elastic integrations. When used together, we can help you visualize your datasets and index sensitive information within your organization so you can understand three critical things:

  • What is being stored?
  • Is it being stored properly according to company policies and compliance?
  • Where is it located?

Useful Links

In this guide we’re going to explain how Private AI integrates with Elastic for various proactive reporting use cases, therefore, please refer to our ELK Reporting Integration Guide on Elastic for more details on the integration process and to see a list of the configurable parameters.

Terms Used in This Guide

Below is a list of terms that will be used in this guide.

Term

Definition

DLP

Data Loss Prevention

ELK

Elastic Logstash Kibana

Entity

Any specific piece of information within a document or dataset that can be classified as sensitive data, such as PII and PCI

On-prem

Application software or services that are run within a network on infrastructure that is controlled by an organization

PCI

Payment Card Industry

PII

Personally Identifiable Information

Quick Refresher: What is Elastic and the “ELK Stack”?

The "ELK Stack" refers to a powerful combination of three products from Elastic: Elasticsearch, Logstash, and Kibana. Together, they provide an integrated solution for searching, analyzing, and visualizing the data.

Elasticsearch is the core component. It works as a highly scalable search + analytics engine. Although it excels in data indexing and storage, it lacks the capability (on its own) to import the data to be indexed.

Logstash is integral for the data ingestion and processing into Elastic. Logstash acts as a dynamic data processing pipeline, which can transform and transport data from various sources before it reaches Elasticsearch.

Kibana, the final piece of the stack, allows managers, CIOs, and CISOs to visualize the data indexed by Elastic. It features interactive dashboards that allow users to intuitively explore and interpret complex datasets, while facilitating a clear understanding of data distribution and status within an organization’s infrastructure.

Combined with Private AI, the ELK Stack offers extensive capabilities for real-time security monitoring and it can be leveraged to help streamline compliance by centralizing logging and auditing functions. Together, we can assist in threat detection and incident response, ensuring that all aspects of organizational technology infrastructure are both secure and efficient.

Adequate DLP Does Not Need to Be Overly Complicated

As a CIO or CISO, you need a DLP (data loss prevention) plan for your organization. If an ounce of prevention is worth a pound of cure, then you understand the value of a DLP plan. In order to provide PII (Personally Identifiable Information) detection that’s flexible for various corporate infrastructures, we give you the option to deploy our solution as a Docker container or use us in the cloud – whichever is your preference.

We believe that an adequate DLP plan does not need to be overly complicated. Therefore, your locally deployed Private AI container can be configured to send reporting metrics to a Logstash server simply by adding an environment variable to your container runtime  as shown below.


docker run --rm -p 8080:8080 --mount type=bind,src=$PWD/tests/fixtures/licenses/license.json,dst=/app/license/license.json -e PAI_ENABLE_REPORTING=true -e LOGSTASH_HOST=http://hostname.org -e LOGSTASH_PORT=50000 -e PAI_REPORT_ENTITY_COUNTS=true -it deid:image-name<br>



As a result, this makes it very simple for your DevOps and server team to write the scripts necessary to configure your instance of Private AI to work seamlessly with the ELK Stack.For more information, see our ELK Reporting Integration Guide on Elastic.

Example Scenarios Where You Need PII Detection/Redaction with Powerful Indexing

Let's delve into some real-world examples of how IT and security managers can use Private AI integrated with the ELK Stack for common DLP-related situations.

Scenario #1 - Company A: Full Credit Card Data is Stored Within a Spreadsheet

Company A is a tech startup specializing in software development for the insurance industry. As a part of doing business, the sales team travels frequently to their customers’ offices and events to demonstrate the new capabilities of their software tools. In order to make expense reports easy to manage, each sales team member is provided with a company credit card in the company’s name.

Company A has a DLP plan to conduct monthly internal security audits. The IT manager uses a custom script to scan computers on the corporate infrastructure to ensure that all employees are using best practices for cybersecurity. The script simply takes a batch of files and submits them to their locally installed Private AI instance running in a Docker container. Figure 1, below shows the flow of how the integration is accomplished.

Figure 1. Private AI Data Discovery with ELK

The IT manager and the CIO are now able to login to Elastic and view a Kibana dashboard that shows all the PII that has been found by the on-prem Private AI container, as shown in Figure 2 below.

Figure 2. An Elastic Stack Kibana Dashboard After Company Data Has Been Indexed by Elasticsearch

As you can see above in the cards in the Kibana dashboard, the on-prem Private AI container scanned nearly 12k files and found over 25M occurrences of identifiable PII.Due to the fact that the Private AI container is installed on-prem, company files can be scanned safely and securely within the company’s corporate infrastructure, which provides the following benefits:

  • Company files with intellectual property are not scanned outside the corporate network infrastructure, which eliminates the possibility of a data breach due to the scanning process itself
  • Scanning time (and CPU usage) are greatly reduced by using the Private AI container installed on-prem compared to cloud-based solutions
  • External bandwidth costs are eliminated due to the fact that all scanning is performed within the corporate infrastructure

Root Cause Analysis

In the example above, within the nearly 12k files scanned, the IT manager has found that sales members are storing their full credit card information in unencrypted spreadsheets in order to make the process easy to fill out credit card information when booking travel online.

Potential Next Steps

The IT manager, CIO, and CISO have quantifiable information about the extent of PII stored within the computers on the corporate infrastructure. Potential next steps include:

  • Providing the proper training and tools to company employees to allow for secure/encrypted data storage. Many password managers allow for the secure storage of credit card information with copy/paste features for online sites
  • Providing training to company employees on the company policies for secure data storage
  • Performing subsequent monthly audits to ensure that the number of credit cards (or other PII) are no longer stored in an insecure manner

Scenario #2 - Company B: Sensitive Information Being Shared in Customer Support Chat Windows is Not Being Redacted

Company B is in the financial services industry, and they have recently implemented an AI-enabled chat bot on their consumer-facing website as a cost savings measure to provide Level-1 tech support for their customers.

It’s a very common occurrence for banking customers to feel safe to share their name, financial information, and/or social security number with their financial institution because the company has already established a level of trust with the customer with these types of information.

Similar to Company A, Company B has a DLP plan to conduct monthly internal security audits. In this case, the CISO uses a custom script to send all log files from public facing websites to the Private AI container in their corporate infrastructure. The integration of the Private AI container and the ELK stack is the same as in Figure 1 located above. Additionally, any record of PII being stored within server logs can be visualized in the Kibana dashboard, very similar to the Kibana dashboard shown in Figure 2 above.

Root Cause Analysis

The CISO has found that customers are entering both PII and PCI (Payment Card Industry data) in the chatbot window. This information is stored in plain-text format in the log files for the chatbot and is readable by anyone with access (authorized or unauthorized) to the files.

Potential Next Steps

In this scenario, armed with this information, the CISO has taken the proactive step to enhance their custom service to use Private AI for Smart Redaction, as shown in Figure 3 below.

Figure 3. Using Private AI for Smart Redaction to Process Sensitive Documents

The custom service now has two major functions:

  • All log files from public websites are fed into Logstash for analysis of any sensitive information (same as before)
  • Before any chat messages with a customer are stored to disk, it is processed with the local Private AI container to redact any sensitive information. The redacted chat messages are then sent to Elastic so that the conversation can be searched by the tech support team.

Conclusion

By combining ELK Stack's robust tools for data management, analytics, and visualization with Private AI's expertise in privacy-preserving techniques, organizations can unlock actionable insights from their data while safeguarding sensitive information and ensuring compliance with regulatory requirements. This guide provided a few practical case examples to show how easily PII and PCI can be found anywhere in your corporate infrastructure.

Using the ELK Stack with Private AI, CIOs and CISOs have the tools to create a robust DLP plan and perform root-cause analysis to strengthen the integrity of the corporate infrastructure.

Data Left Behind: AI Scribes’ Promises in Healthcare

Why is linguistics essential when dealing with healthcare data?

Why Health Data Strategies Fail Before They Start

Private AI to Redefine Enterprise Data Privacy and Compliance with NVIDIA

EDPB’s Pseudonymization Guideline and the Challenge of Unstructured Data

HHS’ proposed HIPAA Amendment to Strengthen Cybersecurity in Healthcare and how Private AI can Support Compliance

Japan's Health Data Anonymization Act: Enabling Large-Scale Health Research

What the International AI Safety Report 2025 has to say about Privacy Risks from General Purpose AI

Private AI 4.0: Your Data’s Potential, Protected and Unlocked

How Private AI Facilitates GDPR Compliance for AI Models: Insights from the EDPB's Latest Opinion

Navigating the New Frontier of Data Privacy: Protecting Confidential Company Information in the Age of AI

Belgium’s Data Protection Authority on the Interplay of the EU AI Act and the GDPR

Enhancing Compliance with US Privacy Regulations for the Insurance Industry Using Private AI

Navigating Compliance with Quebec’s Act Respecting Health and Social Services Information Through Private AI’s De-identification Technology

Unlocking New Levels of Accuracy in Privacy-Preserving AI with Co-Reference Resolution

Strengthened Data Protection Enforcement on the Horizon in Japan

How Private AI Can Help to Comply with Thailand's PDPA

How Private AI Can Help Financial Institutions Comply with OSFI Guidelines

The American Privacy Rights Act – The Next Generation of Privacy Laws

How Private AI Can Help with Compliance under China’s Personal Information Protection Law (PIPL)

PII Redaction for Reviews Data: Ensuring Privacy Compliance when Using Review APIs

Independent Review Certifies Private AI’s PII Identification Model as Secure and Reliable

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

News from NIST: Dioptra, AI Risk Management Framework (AI RMF) Generative AI Profile, and How PII Identification and Redaction can Support Suggested Best Practices

Handling Personal Information by Financial Institutions in Japan – The Strict Requirements of the FSA Guidelines

日本における金融機関の個人情報の取り扱い - 金融庁ガイドラインの要件

Leveraging Private AI to Meet the EDPB’s AI Audit Checklist for GDPR-Compliant AI Systems

Who is Responsible for Protecting PII?

How Private AI can help the Public Sector to Comply with the Strengthening Cyber Security and Building Trust in the Public Sector Act, 2024

A Comparison of the Approaches to Generative AI in Japan and China

Updated OECD AI Principles to keep up with novel and increased risks from general purpose and generative AI

Is Consent Required for Processing Personal Data via LLMs?

The evolving landscape of data privacy legislation in healthcare in Germany

The CIO’s and CISO’s Guide for Proactive Reporting and DLP with Private AI and Elastic

The Evolving Landscape of Health Data Protection Laws in the United States

Comparing Privacy and Safety Concerns Around Llama 2, GPT4, and Gemini

How to Safely Redact PII from Segment Events using Destination Insert Functions and Private AI API

WHO’s AI Ethics and Governance Guidance for Large Multi-Modal Models operating in the Health Sector – Data Protection Considerations

How to Protect Confidential Corporate Information in the ChatGPT Era

Unlocking the Power of Retrieval Augmented Generation with Added Privacy: A Comprehensive Guide

Leveraging ChatGPT and other AI Tools for Legal Services

Leveraging ChatGPT and other AI tools for HR

Leveraging ChatGPT in the Banking Industry

Law 25 and Data Transfers Outside of Quebec

The Colorado and Connecticut Data Privacy Acts

Unlocking Compliance with the Japanese Data Privacy Act (APPI) using Private AI

Tokenization and Its Benefits for Data Protection

Private AI Launches Cloud API to Streamline Data Privacy

Processing of Special Categories of Data in Germany

End-to-end Privacy Management

Privacy Breach Reporting Requirements under Law25

Migrating Your Privacy Workflows from Amazon Comprehend to Private AI

A Comparison of the Approaches to Generative AI in the US and EU

Benefits of AI in Healthcare and Data Sources (Part 1)

Privacy Attacks against Data and AI Models (Part 3)

Risks of Noncompliance and Challenges around Privacy-Preserving Techniques (Part 2)

Enhancing Data Lake Security: A Guide to PII Scanning in S3 buckets

The Costs of a Data Breach in the Healthcare Sector and its Privacy Compliance Implications

Navigating GDPR Compliance in the Life Cycle of LLM-Based Solutions

What’s New in Version 3.8

How to Protect Your Business from Data Leaks: Lessons from Toyota and the Department of Home Affairs

New York's Acceptable Use of AI Policy: A Focus on Privacy Obligations

Safeguarding Personal Data in Sentiment Analysis: A Guide to PII Anonymization

Changes to South Korea’s Personal Information Protection Act to Take Effect on March 15, 2024

Australia’s Plan to Regulate High-Risk AI

How Private AI can help comply with the EU AI Act

Comment la Loi 25 Impacte l'Utilisation de ChatGPT et de l'IA en Général

Endgültiger Entwurf des Gesetzes über Künstliche Intelligenz – Datenschutzpflichten der KI-Modelle mit Allgemeinem Verwendungszweck

How Law25 Impacts the Use of ChatGPT and AI in General

Is Salesforce Law25 Compliant?

Creating De-Identified Embeddings

Exciting Updates in 3.7

EU AI Act Final Draft – Obligations of General-Purpose AI Systems relating to Data Privacy

FTC Privacy Enforcement Actions Against AI Companies

The CCPA, CPRA, and California's Evolving Data Protection Landscape

HIPAA Compliance – Expert Determination Aided by Private AI

Private AI Software As a Service Agreement

EU's Review of Canada's Data Protection Adequacy: Implications for Ongoing Privacy Reform

Acceptable Use Policy

ISO/IEC 42001: A New Standard for Ethical and Responsible AI Management

Reviewing OpenAI's 31st Jan 2024 Privacy and Business Terms Updates

Comparing OpenAI vs. Azure OpenAI Services

Quebec’s Draft Regulation Respecting the Anonymization of Personal Information

Version 3.6 Release: Enhanced Streaming, Auto Model Selection, and More in Our Data Privacy Platform

Brazil's LGPD: Anonymization, Pseudonymization, and Access Requests

LGPD do Brasil: Anonimização, Pseudonimização e Solicitações de Acesso à Informação

Canada’s Principles for Responsible, Trustworthy and Privacy-Protective Generative AI Technologies and How to Comply Using Private AI

Private AI Named One of The Most Innovative RegTech Companies by RegTech100

Data Integrity, Data Security, and the New NIST Cybersecurity Framework

Safeguarding Privacy with Commercial LLMs

Cybersecurity in the Public Sector: Protecting Vital Services

Privacy Impact Assessment (PIA) Requirements under Law25

Elevate Your Experience with Version 3.5

Fine-Tuning LLMs with a Focus on Privacy

GDPR in Germany: Challenges of German Data Privacy (Part 2)

Comply with US Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence using Private AI

How to Comply with EU AI Act using PrivateGPT

Navigating the Privacy Paradox: A Guide to Ethical Fine-Tuning of Large Language Models

Adding Privacy to LangChain with Private AI