Secure and Compliant Chatbots: Data Privacy, PII Redaction, and Governance

Executive Summary / Key Results

Evergreen Health Network (EHN), a 9-hospital healthcare system serving 1.3 million patients across the Midwest, needed a patient-facing AI chatbot to reduce call-center volume while meeting strict security and privacy requirements. In 16 weeks, we designed and launched a secure, compliant chatbot with real-time PII redaction, auditable governance, and a defense-in-depth architecture. The result was a safer, faster, and more reliable patient experience—without compromising privacy or violating HIPAA controls.

47% call deflection on routine inquiries (benefits, coverage, appointment prep) within 90 days
38% reduction in average handle time (AHT) for escalated chats to live agents
81% first-contact resolution (FCR) for covered intents; 94% intent classification accuracy
0 confirmed PII/PHI data leaks across 12.4 million messages processed in the first 9 months
62 ms P99 redaction latency; 98.6% precision / 96.9% recall on PII detection in production traffic
$1.6M annualized operational savings from agent time and call deflection
99.98% service uptime with zero model-run data retention at the LLM provider
Compliance assurance: HIPAA controls mapped and logged; SOC 2 Type II–aligned processes; third-party privacy audit passed with no critical findings

Background / Challenge

Evergreen Health Network’s patient access and member services teams were overwhelmed. Call queues during open enrollment and flu season spiked above a 22-minute average wait time. Members wanted instant answers about coverage, ID cards, co-pays, and claims status—yet every conversation risked exposing personally identifiable information (PII) or protected health information (PHI).

EHN’s leadership saw AI chatbots making headlines but had legitimate concerns:

The security team worried about data leakage into public models and uncontrolled data retention.
Compliance demanded traceability: who said what, when, and whether any PHI left the boundary.
Legal needed a defensible governance model aligned to HIPAA, CCPA, and internal policies.
Operations wanted measurable deflection—without confusing patients or increasing rework for human agents.

Initial pilots elsewhere had stumbled. In one test with a generic cloud chatbot, a member’s claim number and partial SSN appeared in an LLM prompt log, triggering an incident review. The message was clear: unless privacy was handled end-to-end—with robust PII redaction, access controls, and observability—AI would not go live.

Our mandate: deliver a secure, compliant chatbot that patients trust and the compliance team can defend, plus results that matter to the contact center.

Solution / Approach

We partnered with EHN to design a security-first chatbot architecture anchored in privacy, governance, and measurable outcomes. Our approach combined real-time PII redaction, policy-aware retrieval-augmented generation (RAG), and layered governance that kept sensitive data inside EHN’s boundary.

At a glance, the solution included:

A privacy gateway that redacts PII/PHI at the edge before any model sees user input
A RAG pipeline limited to policy-approved, patient-safe documents with access controls
Zero-retention LLM calls over private networking; encryption in transit and at rest with EHN-managed keys
Fine-grained role-based access control (RBAC) and attribute-based access control (ABAC) for agents and admins
End-to-end auditability with immutable logs, conversation replays, and explainable policy decisions
Continuous evaluation: regression tests, red-team prompts, and A/B testing for quality and safety

For architecture-minded readers, see our secure blueprint in Technology and Architecture: A Complete Guide, and our detailed retrieval design in RAG for Chatbots: Retrieval-Augmented Generation Architecture, Tools, and Tuning. We also measured outcomes rigorously following the practices in Chatbot Analytics and Evaluation: KPIs, A/B Testing, and Conversation Quality.

Privacy by Design

From day one, we designed for the worst-case scenario: a user typing their full name, policy ID, claim number, SSN, or medical details into the chat. Our guiding principle—no sensitive data should leave the protected boundary—drove key decisions:

Client-side pre-filtering minimized obvious leaks before network transmission.
A server-side privacy gateway performed deterministic (regex) and contextual (ML/NLP) PII redaction with high precision.
All redacted tokens were replaced with stable placeholders and reversible tokens for internal, authorized systems.
LLM providers were configured with zero data retention, private endpoints, and strict allowlist prompts.

Defense in Depth

Privacy alone is not enough. We layered controls across identity, network, data, and runtime:

SSO + MFA; least-privilege RBAC for support staff, with just-in-time elevation for auditors
Network segmentation, private service endpoints, and WAF protections
Field-level encryption, envelope encryption with KMS-managed keys, and periodic key rotation
Immutable audit logs with cryptographic integrity checks; structured event streams for SIEM
Safe response controls: content filters, jailbreak resistance, and topic boundaries for medical advice

Safety-Aligned RAG

We implemented a closed-domain RAG system that only retrieves content from approved knowledge sources (benefits booklets, FAQs, plan documents, clinic instructions) tagged with sensitivity labels. The retriever enforced ABAC so a patient could never retrieve another member’s information. The generator applied system prompts that explicitly prohibit handling of diagnoses or treatment advice and redirect to licensed clinicians when appropriate.

Human-in-the-Loop and Governance

We stood up a governance board with stakeholders from security, compliance, legal, clinical, and member services. The board approved release gates and reviewed metrics monthly. Human agents could take over any conversation, and the chatbot’s suggested replies were visible to agents with rationales and source citations.

Implementation

EHN needed speed without shortcuts. We executed in four phases over 16 weeks, with clear controls and test coverage at each step.

Phase 1 (Weeks 1–3): Discovery and Risk Assessment

We mapped business goals and risks: what to automate, what not to touch, which conversations carry higher risk, and where data might flow. The compliance team identified HIPAA-relevant controls, data lifecycles, and data subject rights. We documented a data flow diagram, a record of processing activities (ROPA), and completed a privacy impact assessment (PIA).

Key deliverables included a control matrix mapping chatbot components to safeguards and log coverage, plus a red-team plan for prompt and jailbreak testing.

Phase 2 (Weeks 4–7): Architecture and Redaction Pipeline

We built the privacy gateway as a standalone, horizontally scalable microservice. It performed layered detection:

Deterministic regex patterns for SSNs, MRNs, policy IDs, claim numbers, phone numbers, and emails
Contextual named-entity recognition (NER) and transformer-based classifiers for names, locations, dates of service, and clinical terms under HIPAA identifiers
Risk scoring and explainability: each redaction included a rule ID, model confidence, and replacement type

We tuned the pipeline with EHN-specific formats. For example, claim IDs followed a pattern that looked like order numbers elsewhere; we trained a lightweight classifier on 2,400 labeled examples to reduce false positives by 31%.

Here’s a simplified before/after example from our redaction unit tests:

User: Hi, my name is Carla Nguyen. My SSN is 123-45-6789, and my claim is CLM-98231 for visit on 10/18/2025 at St. Mary’s.

Redacted Input (to LLM): Hi, my name is [NAME]. My SSN is [SSN], and my claim is [CLAIM_ID] for visit on [DATE] at [HOSPITAL].

Safe Response Pattern: I can’t process personal identifiers here. For claim [CLAIM_ID], here’s general guidance... If you’d like to look up your claim, tap “Secure Lookup” to authenticate.

We integrated the LLM via private networking with zero-retention flags and content filters. RAG used an encrypted vector store; content ingestion sanitized and labeled documents, stripping residual identifiers from PDFs and scanned images. Every retrieval included provenance metadata, sensitivity labels, and a TTL.

Phase 3 (Weeks 8–12): Experience, Guardrails, and Analytics

We designed conversation flows for coverage questions, ID cards, clinic hours, claims, and pre-visit instructions. Each flow had clear escalation rules to live agents. We incorporated safe response templates for restricted topics and disclaimers for medical questions.

We stood up an analytics layer with metrics for deflection, FCR, AHT, containment, user satisfaction (CSAT), and safety incidents. We followed the methods in Chatbot Analytics and Evaluation: KPIs, A/B Testing, and Conversation Quality to instrument comparable baselines. We also implemented offline evaluation suites and nightly regression tests with 1,200 scenario prompts, including adversarial inputs and PII stress tests.

A/B tests compared different prompt templates and retrieval settings. One test showed that adding two high-precision filters to the retriever improved citation accuracy by 9.4% and reduced hallucination flags by 27% without hurting latency.

Phase 4 (Weeks 13–16): Compliance, Training, and Launch

We finalized governance artifacts: SOPs for model changes, incident response runbooks, access reviews, and DPIA updates. We trained 180 agents and supervisors on safe handoffs, traceability, and how to read AI rationales.

A third-party firm validated our controls against HIPAA safeguards and SOC 2 principles. We completed red-team testing and a soft launch in two clinics, followed by a system-wide rollout.

Results with specific metrics

EHN’s secure chatbot delivered measurable outcomes across operations, safety, and compliance. All metrics below reflect the first 90 days post-launch unless noted.

Operational impact:

47% call deflection for covered intents across web and mobile portals, based on 2.1M sessions
38% reduction in AHT for escalated chats (from 8:32 to 5:17), saving 6,720 agent hours/quarter
81% FCR for supported topics; 94% intent classification accuracy on audited samples (n=1,000)
$1.6M annualized savings from deflection and AHT improvements, net of hosting and license costs

Safety and privacy performance:

0 confirmed PII/PHI leaks across 12.4M messages over 9 months; 0 Sev-1 safety incidents
62 ms P99 redaction latency at the gateway; 98.6% precision and 96.9% recall on PII detection (validated against 15k labeled utterances)
0% LLM data retention; private networking to the model endpoint; content filters blocked 3.1% of prompts safely
27% reduction in false positives after custom classifier tuning; user friction complaints dropped by 19%

Governance and compliance:

Third-party privacy audit passed with no critical findings and two minor recommendations implemented within 30 days
Comprehensive audit trail: 100% of conversations captured with redaction logs, retrieval provenance, and policy decisions
Access reviews automated quarterly; no orphaned admin accounts; role drift reduced by 88%
Data subject request (DSR) response time improved by 72% thanks to unified data retention and deletion workflows

Patient experience:

4.6/5 average CSAT for chatbot sessions; 23% higher CSAT when responses included source citations
12% increase in portal adoption among newly insured members in the first open enrollment post-launch

Mini-case: ID card retrieval

Before the chatbot, ID card requests made up 18% of calls. With the new flow, users authenticate via the portal, and the chatbot surfaces a secure link. In the first quarter, 84% of ID card requests completed self-service in under 60 seconds, and misroutes to claims were nearly eliminated.

Key Takeaways

Security is a design choice, not an add-on. Real-time PII redaction, zero-retention LLM calls, and safe RAG boundaries made secure chatbots practical for a HIPAA-regulated environment.
Governance builds confidence. With a control matrix, immutable logs, and release gates, compliance and legal became champions—not blockers—of the initiative.
Guardrails improve quality. Policy-aware retrieval and safe prompts reduced hallucinations, raised citation accuracy, and boosted user trust.
Measure what matters. Deflection, FCR, AHT, and CSAT told the business story, while redaction precision/recall, incident rates, and audit coverage satisfied risk teams.
Start narrow, then scale. By focusing on high-volume, policy-safe intents first (ID cards, clinic hours, coverage basics), EHN earned quick wins before expanding to more complex scenarios.

If you’re planning a secure chatbot, bookmark our secure reference architecture guide, explore how to restrict answers with RAG for regulated chatbots, and set up your measurement plan with our guide to chatbot KPIs and conversation quality.

About Evergreen Health Network (Client)

Evergreen Health Network (EHN) is a not-for-profit, integrated healthcare system with nine hospitals, 120+ clinics, and 16,000 employees serving urban and rural communities across three states. EHN is committed to equitable access, patient privacy, and technology innovation that improves care while protecting sensitive data.

We help organizations like EHN transform service operations with secure chatbots, autonomous agents, and intelligent automation. Our team blends deep AI expertise with practical governance and change management, so you get clear value, reliable service, and easy-to-understand guidance—without the compliance headaches. Ready to explore your roadmap? Let’s schedule a consultation.

Malecu | Custom AI Solutions for Business Growth

Case Study: Secure and Compliant Chatbots—Data Privacy, PII Redaction, and Governance