OpenAI Privacy Filter for healthcare — HIPAA compliance guide (2026)
Healthcare developers building AI-powered tools face a specific challenge: clinical notes, patient records, and support tickets contain Protected Health Information (PHI) that must be redacted before it flows into LLM prompts, fine-tuning datasets, or analytics pipelines. OpenAI Privacy Filter is a strong general-purpose PII detector — but can it be used in a HIPAA-regulated environment?
Short answer: not directly, without additional architecture. Here's the full picture.
HIPAA basics: what "HIPAA compliant" actually means for a PII tool
HIPAA's Safe Harbor de-identification method requires you to remove 18 specific identifiers from health records before they're considered de-identified. Using a third-party tool to do that removal makes the tool a Business Associate — meaning you need a Business Associate Agreement (BAA) with the vendor before sending any PHI to them.
PHI coverage: which of the 18 identifiers does it detect?
| HIPAA Safe Harbor identifier | Detected by Privacy Filter? |
|---|---|
| Names | Yes — PERSON |
| Geographic data (smaller than state) | Partial — ADDRESS |
| Dates (except year) | Partial — DATE_OF_BIRTH; generic dates sometimes missed |
| Phone numbers | Yes — PHONE |
| Fax numbers | Partial — flagged under PHONE |
| Email addresses | Yes — EMAIL |
| Social security numbers | Yes — SSN |
| Medical record numbers | No — not a native entity type |
| Health plan beneficiary numbers | No |
| Account numbers | No |
| Certificate / license numbers | No |
| Vehicle identifiers / serial numbers | No |
| Device identifiers | No |
| Web URLs | Yes — URL |
| IP addresses | Yes — IP_ADDRESS |
| Biometric identifiers | No |
| Full-face photographs | Text only — N/A |
| Any other unique identifying number | Partial — falls into OTHER |
Coverage: 7 out of 18 identifiers well-covered, 4 partially covered, 7 not covered. For healthcare workloads, you need to supplement with custom patterns for medical record numbers, account numbers, and other institution-specific identifiers.
Compliant deployment pattern
If you are a HIPAA-covered entity, here is how to use LLM-based PII detection without violating HIPAA:
Option 1: Self-hosted Microsoft Presidio
Presidio runs entirely on your infrastructure. No data leaves your environment, so no BAA is needed. Deploy it on an internal server or AWS/GCP VPC, add custom recognizers for medical record numbers and health plan IDs, and route all clinical text through it before sending to any external LLM. See our Presidio comparison for deployment notes.
Option 2: AWS Comprehend PII with HIPAA-eligible configuration
AWS Comprehend PII is included in AWS's HIPAA-eligible services list. If you already have a BAA with AWS and process data in a HIPAA-eligible region, this is the path of least resistance. See our AWS Comprehend comparison.
Option 3: Pre-filter with regex, then use PrivacyFilter for contextual PII
For non-covered-entity teams (e.g., health tech startups that don't directly handle PHI from covered entities), a practical approach is to strip structured PHI (record numbers, SSNs, account numbers) with deterministic regex before sending the document to PrivacyFilter.run. The LLM layer then catches the contextual PII that regex misses (names embedded in narrative text, implied location references).
import re, httpx
# Deterministic pre-strip for structured PHI
PHI_PATTERNS = {
"MRN": r'\bMRN[:\s#]*\d{6,10}\b',
"SSN": r'\b\d{3}-\d{2}-\d{4}\b',
"DOB_FMT": r'\b(0[1-9]|1[0-2])[\/\-](0[1-9]|[12]\d|3[01])[\/\-]\d{2,4}\b',
}
def pre_strip_phi(text: str) -> str:
for label, pattern in PHI_PATTERNS.items():
text = re.sub(pattern, f"[{label}_REDACTED]", text)
return text
def redact_clinical_note(note: str, license_key: str) -> str:
clean = pre_strip_phi(note) # deterministic PHI first
resp = httpx.post( # contextual PII second
"https://privacyfilter.run/api/redact",
json={"text": clean, "license_key": license_key}
)
return resp.json()["redacted_text"]
Use cases that are safe today
FAQ
Is OpenAI Privacy Filter HIPAA compliant?
Not directly. PrivacyFilter.run does not currently offer a BAA. For HIPAA-covered entities, use a self-hosted alternative (Presidio) or a BAA-covered cloud service (AWS Comprehend PII with a HIPAA-eligible BAA).
Which of the 18 HIPAA Safe Harbor identifiers does it detect?
7 well-covered (names, phone, email, SSN, IP, URL, partial address/dates). 7 not covered natively (MRNs, account numbers, device IDs, biometrics). A supplemental regex layer is required for full Safe Harbor coverage.
Can it process clinical notes in GDPR contexts?
For EU healthcare data under GDPR, see the GDPR text anonymization guide. The BAA requirement is US-specific; GDPR uses a different framework (DPA + controller/processor agreement).
For more use cases, see our 7 real use cases for OpenAI Privacy Filter.
Working with synthetic or non-PHI healthcare text? Try PrivacyFilter.run free — no BAA needed for non-covered data.