Which PHI identifiers does OpenAI Privacy Filter detect?

It detects PERSON (patient names), PHONE, EMAIL, ADDRESS, DATE_OF_BIRTH, SSN, and IP_ADDRESS — covering most of the 18 HIPAA Safe Harbor identifiers. It does not natively detect health plan beneficiary numbers, account numbers, certificate/license numbers, vehicle identifiers, or biometric identifiers.

← Blog · April 28, 2026 · 8 min read

OpenAI Privacy Filter for healthcare — HIPAA compliance guide (2026)

Q: Is OpenAI Privacy Filter HIPAA compliant?

Not directly. To use OpenAI Privacy Filter in a HIPAA-regulated environment, you need a Business Associate Agreement (BAA) with both OpenAI (for the underlying model) and with PrivacyFilter.run. As of April 2026, PrivacyFilter.run does not offer a BAA. For HIPAA-covered entities, the recommended approach is to use a self-hosted alternative such as Microsoft Presidio or a HIPAA-covered AWS Comprehend configuration.

Healthcare developers building AI-powered tools face a specific challenge: clinical notes, patient records, and support tickets contain Protected Health Information (PHI) that must be redacted before it flows into LLM prompts, fine-tuning datasets, or analytics pipelines. OpenAI Privacy Filter is a strong general-purpose PII detector — but can it be used in a HIPAA-regulated environment?

Short answer: not directly, without additional architecture. Here's the full picture.

HIPAA basics: what "HIPAA compliant" actually means for a PII tool

HIPAA's Safe Harbor de-identification method requires you to remove 18 specific identifiers from health records before they're considered de-identified. Using a third-party tool to do that removal makes the tool a Business Associate — meaning you need a Business Associate Agreement (BAA) with the vendor before sending any PHI to them.

⚠ Current status (April 2026): PrivacyFilter.run does not yet offer a BAA. Do not send identifiable patient data to the PrivacyFilter.run API endpoint if you are a HIPAA Covered Entity or Business Associate. See the compliant workaround below.

PHI coverage: which of the 18 identifiers does it detect?

HIPAA Safe Harbor identifier	Detected by Privacy Filter?
Names	Yes — PERSON
Geographic data (smaller than state)	Partial — ADDRESS
Dates (except year)	Partial — DATE_OF_BIRTH; generic dates sometimes missed
Phone numbers	Yes — PHONE
Fax numbers	Partial — flagged under PHONE
Email addresses	Yes — EMAIL
Social security numbers	Yes — SSN
Medical record numbers	No — not a native entity type
Health plan beneficiary numbers	No
Account numbers	No
Certificate / license numbers	No
Vehicle identifiers / serial numbers	No
Device identifiers	No
Web URLs	Yes — URL
IP addresses	Yes — IP_ADDRESS
Biometric identifiers	No
Full-face photographs	Text only — N/A
Any other unique identifying number	Partial — falls into OTHER

Coverage: 7 out of 18 identifiers well-covered, 4 partially covered, 7 not covered. For healthcare workloads, you need to supplement with custom patterns for medical record numbers, account numbers, and other institution-specific identifiers.

Compliant deployment pattern

If you are a HIPAA-covered entity, here is how to use LLM-based PII detection without violating HIPAA:

Option 1: Self-hosted Microsoft Presidio

Presidio runs entirely on your infrastructure. No data leaves your environment, so no BAA is needed. Deploy it on an internal server or AWS/GCP VPC, add custom recognizers for medical record numbers and health plan IDs, and route all clinical text through it before sending to any external LLM. See our Presidio comparison for deployment notes.

Option 2: AWS Comprehend PII with HIPAA-eligible configuration

AWS Comprehend PII is included in AWS's HIPAA-eligible services list. If you already have a BAA with AWS and process data in a HIPAA-eligible region, this is the path of least resistance. See our AWS Comprehend comparison.

Option 3: Pre-filter with regex, then use PrivacyFilter for contextual PII

For non-covered-entity teams (e.g., health tech startups that don't directly handle PHI from covered entities), a practical approach is to strip structured PHI (record numbers, SSNs, account numbers) with deterministic regex before sending the document to PrivacyFilter.run. The LLM layer then catches the contextual PII that regex misses (names embedded in narrative text, implied location references).

import re, httpx

# Deterministic pre-strip for structured PHI
PHI_PATTERNS = {
    "MRN":     r'\bMRN[:\s#]*\d{6,10}\b',
    "SSN":     r'\b\d{3}-\d{2}-\d{4}\b',
    "DOB_FMT": r'\b(0[1-9]|1[0-2])[\/\-](0[1-9]|[12]\d|3[01])[\/\-]\d{2,4}\b',
}

def pre_strip_phi(text: str) -> str:
    for label, pattern in PHI_PATTERNS.items():
        text = re.sub(pattern, f"[{label}_REDACTED]", text)
    return text

def redact_clinical_note(note: str, license_key: str) -> str:
    clean = pre_strip_phi(note)          # deterministic PHI first
    resp = httpx.post(                   # contextual PII second
        "https://privacyfilter.run/api/redact",
        json={"text": clean, "license_key": license_key}
    )
    return resp.json()["redacted_text"]

Use cases that are safe today

Safe without BAA: Processing synthetic training data, de-identified datasets (already compliant per Safe Harbor), internal developer tools where no real patient data is used, product demos with synthetic records.

FAQ

Is OpenAI Privacy Filter HIPAA compliant?

Not directly. PrivacyFilter.run does not currently offer a BAA. For HIPAA-covered entities, use a self-hosted alternative (Presidio) or a BAA-covered cloud service (AWS Comprehend PII with a HIPAA-eligible BAA).

Which of the 18 HIPAA Safe Harbor identifiers does it detect?

7 well-covered (names, phone, email, SSN, IP, URL, partial address/dates). 7 not covered natively (MRNs, account numbers, device IDs, biometrics). A supplemental regex layer is required for full Safe Harbor coverage.

Can it process clinical notes in GDPR contexts?

For EU healthcare data under GDPR, see the GDPR text anonymization guide. The BAA requirement is US-specific; GDPR uses a different framework (DPA + controller/processor agreement).

For more use cases, see our 7 real use cases for OpenAI Privacy Filter.

Working with synthetic or non-PHI healthcare text? Try PrivacyFilter.run free — no BAA needed for non-covered data.

Try free →