What is OpenAI Privacy Filter? Complete developer guide
OpenAI Privacy Filter is a hosted PII detection model that takes raw text and returns the personally identifiable entities it contains — names, emails, phone numbers, addresses, SSNs, credit cards, IP addresses, and more — with character-level offsets you can use to redact, mask, or tag the original document.
It launched on 2026-04-23 and immediately surfaced in developer communities as the first context-aware PII detector wrapped in a clean, callable API. Unlike regex-based libraries, it catches "John from accounting" as well as john@accounting.com.
Why developers care
Three concrete pains it solves:
- Pre-LLM scrubbing: remove PII from prompts before sending them to ChatGPT, Claude, or Gemini.
- Log sanitization: clean support transcripts and server logs before storing them or fine-tuning a model.
- Compliance reviews: automate the first pass of GDPR/CCPA review for documents being shared externally.
What does it return?
For each piece of input text, the API returns a JSON list of entities:
{
"entities": [
{ "type": "PERSON", "original": "Alex Tan", "start": 12, "end": 20 },
{ "type": "EMAIL", "original": "alex@acme.com", "start": 36, "end": 49 },
{ "type": "PHONE", "original": "+1 555-0123", "start": 65, "end": 76 }
]
}
Supported entity types: PERSON, EMAIL, PHONE, ADDRESS, SSN, DATE_OF_BIRTH, CREDIT_CARD, IP_ADDRESS, URL, OTHER.
How does it compare to Microsoft Presidio?
Presidio is open-source — you self-host, configure recognizers, and maintain the pipeline. OpenAI Privacy Filter is hosted — you POST text and get entities back. Presidio gives you control; the hosted API gives you speed-to-ship. Most teams move to the hosted option once they realize their accuracy issues come from contextual PII (names mentioned in passing) that regex-based detectors miss.
For a deeper comparison see our side-by-side review.
Getting started in 5 minutes
The fastest way: paste your text into PrivacyFilter.run and copy the redacted output. No signup. 3 free redactions per day, up to 2,000 characters each.
If you'd rather call it from code:
import httpx
resp = httpx.post(
"https://privacyfilter.run/api/redact",
json={"text": "Hi, I'm Alex Tan. Email me at alex@acme.com.",
"license_key": "your-uuid-here"},
)
data = resp.json()
print(data["redacted_text"])
# → "Hi, I'm [PERSON_1]. Email me at [EMAIL_2]."
Free tier returns the same shape (without the license_key field), throttled to 3 calls per day per IP and a 2,000-character limit.
Privacy and data retention
PrivacyFilter does not store your input or redacted output. Only metadata (character count, entity count, hashed IP, timestamp) is logged for rate limiting and analytics. OpenAI is listed as a sub-processor; their API data usage policy applies to the inference call.
Pricing
Free for casual use; $9 one-time for 50 redactions of up to 10,000 chars; $19/month for unlimited. Full pricing.
Try PrivacyFilter free — paste any text and see detected PII in seconds.