OpenAI Privacy Filter — supported entity types (2026)
OpenAI Privacy Filter (accessed via PrivacyFilter.run) detects 10 entity types out of the box. This reference page documents each type with real-world examples, edge cases, and notes on multilingual detection.
PERSON
Full names, first names, last names, nicknames — in any context: "my colleague John", "sincerely, Maria Rossi", "patient: James Doe".
Any RFC-5321-valid email address, including subaddressed forms (+tag) and unusual TLDs. Also catches obfuscated forms like "alex at acme dot com" in surrounding text context.
PHONE
Domestic and international phone numbers in any common format: E.164, NANP, European, with or without separators. Catches extensions (ext. 104).
ADDRESS
Street addresses, house numbers, zip/postal codes, cities, and full address blocks. Detected contextually — "lives at 42 Baker Street, London SW1A 1AA" is one ADDRESS entity spanning the full location string.
SSN
US Social Security Numbers in XXX-XX-XXXX and unformatted variants. Also catches other national ID numbers from context (Italian codice fiscale, German Personalausweis number, UK NIN) when context makes the type clear.
DATE_OF_BIRTH
Dates that appear in a birth-context: "born on", "DOB:", "date of birth", age combined with birthdate. Standalone dates (meeting times, publication dates) are not flagged.
CREDIT_CARD
16-digit card numbers (Visa, Mastercard), 15-digit Amex, with or without spaces/hyphens. Catches partial redactions ("ending in 4242") as contextual PII.
IP_ADDRESS
IPv4 and IPv6 addresses. Internal ranges (192.168.x.x, 10.x.x.x) are still flagged — they can identify internal users in log files.
URL
Full URLs including query strings and fragments. Catches URLs that may encode PII (e.g., OAuth redirect URIs with email= parameters, Calendly links with names).
OTHER
Catch-all for identifiable information that doesn't fit standard types: medical record numbers, employee IDs, passport numbers, IBAN/account numbers, vehicle plate numbers, or any string the model judges as uniquely identifying an individual.
What the model does NOT flag
- Generic dates (meeting times, deadlines, publication dates) — only birth-context dates
- Common first names used generically ("call the john doe function")
- Company names, product names, or organizational titles
- Numeric strings without identifiable context (order numbers, ticket IDs)
Edge cases and known limitations
- Overlapping entities: "maria@example.com" could match both EMAIL and part of PERSON. The API resolves overlaps, keeping the highest-confidence entity. Inspect
start/endoffsets to check. - Very short names: Two-letter names ("Li", "Al") in isolation may not be flagged without surrounding context.
- Fictional characters: Literary or movie character names in clearly fictional contexts may be missed — the model considers context, not just patterns.
- Non-Latin scripts: Arabic, Chinese, Japanese, and Cyrillic names and addresses are detected, but accuracy is lower than for Latin-script languages.
HIPAA special categories
HIPAA's 18 PHI identifiers overlap heavily with the types above (names, phone, email, address, DOB, SSN, IP address, URLs). The main gaps are: geographic subdivisions smaller than state, account/certificate numbers, health plan numbers, and medical device identifiers. These will typically fall into OTHER if context is clear, or may be missed. For HIPAA-critical workflows, add a secondary pass with pattern matching for these remaining identifiers.
See entity detection live — paste any text at PrivacyFilter.run and see color-coded entities in seconds.