← Blog  ·  April 28, 2026  ·  8 min read

OpenAI Privacy Filter for healthcare — HIPAA compliance guide (2026)

Healthcare developers building AI-powered tools face a specific challenge: clinical notes, patient records, and support tickets contain Protected Health Information (PHI) that must be redacted before it flows into LLM prompts, fine-tuning datasets, or analytics pipelines. OpenAI Privacy Filter is a strong general-purpose PII detector — but can it be used in a HIPAA-regulated environment?

Short answer: not directly, without additional architecture. Here's the full picture.

HIPAA basics: what "HIPAA compliant" actually means for a PII tool

HIPAA's Safe Harbor de-identification method requires you to remove 18 specific identifiers from health records before they're considered de-identified. Using a third-party tool to do that removal makes the tool a Business Associate — meaning you need a Business Associate Agreement (BAA) with the vendor before sending any PHI to them.

⚠ Current status (April 2026): PrivacyFilter.run does not yet offer a BAA. Do not send identifiable patient data to the PrivacyFilter.run API endpoint if you are a HIPAA Covered Entity or Business Associate. See the compliant workaround below.

PHI coverage: which of the 18 identifiers does it detect?

HIPAA Safe Harbor identifier Detected by Privacy Filter?
NamesYes — PERSON
Geographic data (smaller than state)Partial — ADDRESS
Dates (except year)Partial — DATE_OF_BIRTH; generic dates sometimes missed
Phone numbersYes — PHONE
Fax numbersPartial — flagged under PHONE
Email addressesYes — EMAIL
Social security numbersYes — SSN
Medical record numbersNo — not a native entity type
Health plan beneficiary numbersNo
Account numbersNo
Certificate / license numbersNo
Vehicle identifiers / serial numbersNo
Device identifiersNo
Web URLsYes — URL
IP addressesYes — IP_ADDRESS
Biometric identifiersNo
Full-face photographsText only — N/A
Any other unique identifying numberPartial — falls into OTHER

Coverage: 7 out of 18 identifiers well-covered, 4 partially covered, 7 not covered. For healthcare workloads, you need to supplement with custom patterns for medical record numbers, account numbers, and other institution-specific identifiers.

Compliant deployment pattern

If you are a HIPAA-covered entity, here is how to use LLM-based PII detection without violating HIPAA:

Option 1: Self-hosted Microsoft Presidio

Presidio runs entirely on your infrastructure. No data leaves your environment, so no BAA is needed. Deploy it on an internal server or AWS/GCP VPC, add custom recognizers for medical record numbers and health plan IDs, and route all clinical text through it before sending to any external LLM. See our Presidio comparison for deployment notes.

Option 2: AWS Comprehend PII with HIPAA-eligible configuration

AWS Comprehend PII is included in AWS's HIPAA-eligible services list. If you already have a BAA with AWS and process data in a HIPAA-eligible region, this is the path of least resistance. See our AWS Comprehend comparison.

Option 3: Pre-filter with regex, then use PrivacyFilter for contextual PII

For non-covered-entity teams (e.g., health tech startups that don't directly handle PHI from covered entities), a practical approach is to strip structured PHI (record numbers, SSNs, account numbers) with deterministic regex before sending the document to PrivacyFilter.run. The LLM layer then catches the contextual PII that regex misses (names embedded in narrative text, implied location references).

import re, httpx

# Deterministic pre-strip for structured PHI
PHI_PATTERNS = {
    "MRN":     r'\bMRN[:\s#]*\d{6,10}\b',
    "SSN":     r'\b\d{3}-\d{2}-\d{4}\b',
    "DOB_FMT": r'\b(0[1-9]|1[0-2])[\/\-](0[1-9]|[12]\d|3[01])[\/\-]\d{2,4}\b',
}

def pre_strip_phi(text: str) -> str:
    for label, pattern in PHI_PATTERNS.items():
        text = re.sub(pattern, f"[{label}_REDACTED]", text)
    return text

def redact_clinical_note(note: str, license_key: str) -> str:
    clean = pre_strip_phi(note)          # deterministic PHI first
    resp = httpx.post(                   # contextual PII second
        "https://privacyfilter.run/api/redact",
        json={"text": clean, "license_key": license_key}
    )
    return resp.json()["redacted_text"]

Use cases that are safe today

Safe without BAA: Processing synthetic training data, de-identified datasets (already compliant per Safe Harbor), internal developer tools where no real patient data is used, product demos with synthetic records.

FAQ

Is OpenAI Privacy Filter HIPAA compliant?

Not directly. PrivacyFilter.run does not currently offer a BAA. For HIPAA-covered entities, use a self-hosted alternative (Presidio) or a BAA-covered cloud service (AWS Comprehend PII with a HIPAA-eligible BAA).

Which of the 18 HIPAA Safe Harbor identifiers does it detect?

7 well-covered (names, phone, email, SSN, IP, URL, partial address/dates). 7 not covered natively (MRNs, account numbers, device IDs, biometrics). A supplemental regex layer is required for full Safe Harbor coverage.

Can it process clinical notes in GDPR contexts?

For EU healthcare data under GDPR, see the GDPR text anonymization guide. The BAA requirement is US-specific; GDPR uses a different framework (DPA + controller/processor agreement).

For more use cases, see our 7 real use cases for OpenAI Privacy Filter.

Working with synthetic or non-PHI healthcare text? Try PrivacyFilter.run free — no BAA needed for non-covered data.

Try free →

Keep reading