Secure and Compliant Chatbots: Data Privacy, PII Redaction, and Governance
Executive Summary / Key Results
Evergreen Health Network (EHN), a 9-hospital healthcare system serving 1.3 million patients across the Midwest, needed a patient-facing AI chatbot to reduce call-center volume while meeting strict security and privacy requirements. In 16 weeks, we designed and launched a secure, compliant chatbot with real-time PII redaction, auditable governance, and a defense-in-depth architecture. The result was a safer, faster, and more reliable patient experience—without compromising privacy or violating HIPAA controls.
- 47% call deflection on routine inquiries (benefits, coverage, appointment prep) within 90 days
- 38% reduction in average handle time (AHT) for escalated chats to live agents
- 81% first-contact resolution (FCR) for covered intents; 94% intent classification accuracy
- 0 confirmed PII/PHI data leaks across 12.4 million messages processed in the first 9 months
- 62 ms P99 redaction latency; 98.6% precision / 96.9% recall on PII detection in production traffic
- $1.6M annualized operational savings from agent time and call deflection
- 99.98% service uptime with zero model-run data retention at the LLM provider
- Compliance assurance: HIPAA controls mapped and logged; SOC 2 Type II–aligned processes; third-party privacy audit passed with no critical findings
Background / Challenge
Evergreen Health Network’s patient access and member services teams were overwhelmed. Call queues during open enrollment and flu season spiked above a 22-minute average wait time. Members wanted instant answers about coverage, ID cards, co-pays, and claims status—yet every conversation risked exposing personally identifiable information (PII) or protected health information (PHI).
EHN’s leadership saw AI chatbots making headlines but had legitimate concerns:
- The security team worried about data leakage into public models and uncontrolled data retention.
- Compliance demanded traceability: who said what, when, and whether any PHI left the boundary.
- Legal needed a defensible governance model aligned to HIPAA, CCPA, and internal policies.
- Operations wanted measurable deflection—without confusing patients or increasing rework for human agents.
Initial pilots elsewhere had stumbled. In one test with a generic cloud chatbot, a member’s claim number and partial SSN appeared in an LLM prompt log, triggering an incident review. The message was clear: unless privacy was handled end-to-end—with robust PII redaction, access controls, and observability—AI would not go live.
Our mandate: deliver a secure, compliant chatbot that patients trust and the compliance team can defend, plus results that matter to the contact center.
Solution / Approach
We partnered with EHN to design a security-first chatbot architecture anchored in privacy, governance, and measurable outcomes. Our approach combined real-time PII redaction, policy-aware retrieval-augmented generation (RAG), and layered governance that kept sensitive data inside EHN’s boundary.
At a glance, the solution included:
- A privacy gateway that redacts PII/PHI at the edge before any model sees user input
- A RAG pipeline limited to policy-approved, patient-safe documents with access controls
- Zero-retention LLM calls over private networking; encryption in transit and at rest with EHN-managed keys
- Fine-grained role-based access control (RBAC) and attribute-based access control (ABAC) for agents and admins
- End-to-end auditability with immutable logs, conversation replays, and explainable policy decisions
- Continuous evaluation: regression tests, red-team prompts, and A/B testing for quality and safety
For architecture-minded readers, see our secure blueprint in Technology and Architecture: A Complete Guide, and our detailed retrieval design in RAG for Chatbots: Retrieval-Augmented Generation Architecture, Tools, and Tuning. We also measured outcomes rigorously following the practices in Chatbot Analytics and Evaluation: KPIs, A/B Testing, and Conversation Quality.
Privacy by Design
From day one, we designed for the worst-case scenario: a user typing their full name, policy ID, claim number, SSN, or medical details into the chat. Our guiding principle—no sensitive data should leave the protected boundary—drove key decisions:
- Client-side pre-filtering minimized obvious leaks before network transmission.
- A server-side privacy gateway performed deterministic (regex) and contextual (ML/NLP) PII redaction with high precision.
- All redacted tokens were replaced with stable placeholders and reversible tokens for internal, authorized systems.
- LLM providers were configured with zero data retention, private endpoints, and strict allowlist prompts.
Defense in Depth
Privacy alone is not enough. We layered controls across identity, network, data, and runtime:
- SSO + MFA; least-privilege RBAC for support staff, with just-in-time elevation for auditors
- Network segmentation, private service endpoints, and WAF protections
- Field-level encryption, envelope encryption with KMS-managed keys, and periodic key rotation
- Immutable audit logs with cryptographic integrity checks; structured event streams for SIEM
- Safe response controls: content filters, jailbreak resistance, and topic boundaries for medical advice
Safety-Aligned RAG
We implemented a closed-domain RAG system that only retrieves content from approved knowledge sources (benefits booklets, FAQs, plan documents, clinic instructions) tagged with sensitivity labels. The retriever enforced ABAC so a patient could never retrieve another member’s information. The generator applied system prompts that explicitly prohibit handling of diagnoses or treatment advice and redirect to licensed clinicians when appropriate.
Human-in-the-Loop and Governance
We stood up a governance board with stakeholders from security, compliance, legal, clinical, and member services. The board approved release gates and reviewed metrics monthly. Human agents could take over any conversation, and the chatbot’s suggested replies were visible to agents with rationales and source citations.
Implementation
EHN needed speed without shortcuts. We executed in four phases over 16 weeks, with clear controls and test coverage at each step.
Phase 1 (Weeks 1–3): Discovery and Risk Assessment
We mapped business goals and risks: what to automate, what not to touch, which conversations carry higher risk, and where data might flow. The compliance team identified HIPAA-relevant controls, data lifecycles, and data subject rights. We documented a data flow diagram, a record of processing activities (ROPA), and completed a privacy impact assessment (PIA).
Key deliverables included a control matrix mapping chatbot components to safeguards and log coverage, plus a red-team plan for prompt and jailbreak testing.
Phase 2 (Weeks 4–7): Architecture and Redaction Pipeline
We built the privacy gateway as a standalone, horizontally scalable microservice. It performed layered detection:
- Deterministic regex patterns for SSNs, MRNs, policy IDs, claim numbers, phone numbers, and emails
- Contextual named-entity recognition (NER) and transformer-based classifiers for names, locations, dates of service, and clinical terms under HIPAA identifiers
- Risk scoring and explainability: each redaction included a rule ID, model confidence, and replacement type
We tuned the pipeline with EHN-specific formats. For example, claim IDs followed a pattern that looked like order numbers elsewhere; we trained a lightweight classifier on 2,400 labeled examples to reduce false positives by 31%.
Here’s a simplified before/after example from our redaction unit tests:
User: Hi, my name is Carla Nguyen. My SSN is 123-45-6789, and my claim is CLM-98231 for visit on 10/18/2025 at St. Mary’s.
Redacted Input (to LLM): Hi, my name is [NAME]. My SSN is [SSN], and my claim is [CLAIM_ID] for visit on [DATE] at [HOSPITAL].
Safe Response Pattern: I can’t process personal identifiers here. For claim [CLAIM_ID], here’s general guidance... If you’d like to look up your claim, tap “Secure Lookup” to authenticate.
We integrated the LLM via private networking with zero-retention flags and content filters. RAG used an encrypted vector store; content ingestion sanitized and labeled documents, stripping residual identifiers from PDFs and scanned images. Every retrieval included provenance metadata, sensitivity labels, and a TTL.
Phase 3 (Weeks 8–12): Experience, Guardrails, and Analytics
We designed conversation flows for coverage questions, ID cards, clinic hours, claims, and pre-visit instructions. Each flow had clear escalation rules to live agents. We incorporated safe response templates for restricted topics and disclaimers for medical questions.
We stood up an analytics layer with metrics for deflection, FCR, AHT, containment, user satisfaction (CSAT), and safety incidents. We followed the methods in Chatbot Analytics and Evaluation: KPIs, A/B Testing, and Conversation Quality to instrument comparable baselines. We also implemented offline evaluation suites and nightly regression tests with 1,200 scenario prompts, including adversarial inputs and PII stress tests.
A/B tests compared different prompt templates and retrieval settings. One test showed that adding two high-precision filters to the retriever improved citation accuracy by 9.4% and reduced hallucination flags by 27% without hurting latency.
Phase 4 (Weeks 13–16): Compliance, Training, and Launch
We finalized governance artifacts: SOPs for model changes, incident response runbooks, access reviews, and DPIA updates. We trained 180 agents and supervisors on safe handoffs, traceability, and how to read AI rationales.
A third-party firm validated our controls against HIPAA safeguards and SOC 2 principles. We completed red-team testing and a soft launch in two clinics, followed by a system-wide rollout.
Results with specific metrics
EHN’s secure chatbot delivered measurable outcomes across operations, safety, and compliance. All metrics below reflect the first 90 days post-launch unless noted.
Operational impact:
- 47% call deflection for covered intents across web and mobile portals, based on 2.1M sessions
- 38% reduction in AHT for escalated chats (from 8:32 to 5:17), saving 6,720 agent hours/quarter
- 81% FCR for supported topics; 94% intent classification accuracy on audited samples (n=1,000)
- $1.6M annualized savings from deflection and AHT improvements, net of hosting and license costs
Safety and privacy performance:
- 0 confirmed PII/PHI leaks across 12.4M messages over 9 months; 0 Sev-1 safety incidents
- 62 ms P99 redaction latency at the gateway; 98.6% precision and 96.9% recall on PII detection (validated against 15k labeled utterances)
- 0% LLM data retention; private networking to the model endpoint; content filters blocked 3.1% of prompts safely
- 27% reduction in false positives after custom classifier tuning; user friction complaints dropped by 19%
Governance and compliance:
- Third-party privacy audit passed with no critical findings and two minor recommendations implemented within 30 days
- Comprehensive audit trail: 100% of conversations captured with redaction logs, retrieval provenance, and policy decisions
- Access reviews automated quarterly; no orphaned admin accounts; role drift reduced by 88%
- Data subject request (DSR) response time improved by 72% thanks to unified data retention and deletion workflows
Patient experience:
- 4.6/5 average CSAT for chatbot sessions; 23% higher CSAT when responses included source citations
- 12% increase in portal adoption among newly insured members in the first open enrollment post-launch
Mini-case: ID card retrieval
Before the chatbot, ID card requests made up 18% of calls. With the new flow, users authenticate via the portal, and the chatbot surfaces a secure link. In the first quarter, 84% of ID card requests completed self-service in under 60 seconds, and misroutes to claims were nearly eliminated.
Key Takeaways
- Security is a design choice, not an add-on. Real-time PII redaction, zero-retention LLM calls, and safe RAG boundaries made secure chatbots practical for a HIPAA-regulated environment.
- Governance builds confidence. With a control matrix, immutable logs, and release gates, compliance and legal became champions—not blockers—of the initiative.
- Guardrails improve quality. Policy-aware retrieval and safe prompts reduced hallucinations, raised citation accuracy, and boosted user trust.
- Measure what matters. Deflection, FCR, AHT, and CSAT told the business story, while redaction precision/recall, incident rates, and audit coverage satisfied risk teams.
- Start narrow, then scale. By focusing on high-volume, policy-safe intents first (ID cards, clinic hours, coverage basics), EHN earned quick wins before expanding to more complex scenarios.
If you’re planning a secure chatbot, bookmark our secure reference architecture guide, explore how to restrict answers with RAG for regulated chatbots, and set up your measurement plan with our guide to chatbot KPIs and conversation quality.
About Evergreen Health Network (Client)
Evergreen Health Network (EHN) is a not-for-profit, integrated healthcare system with nine hospitals, 120+ clinics, and 16,000 employees serving urban and rural communities across three states. EHN is committed to equitable access, patient privacy, and technology innovation that improves care while protecting sensitive data.
We help organizations like EHN transform service operations with secure chatbots, autonomous agents, and intelligent automation. Our team blends deep AI expertise with practical governance and change management, so you get clear value, reliable service, and easy-to-understand guidance—without the compliance headaches. Ready to explore your roadmap? Let’s schedule a consultation.


![Intelligent Document Processing with LLMs: From PDFs to Structured Data [Case Study]](https://images.pexels.com/photos/3619325/pexels-photo-3619325.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=650&w=940)

