← Blog · April 28, 2026 · 6 min read

OpenAI Privacy Filter — supported entity types (2026)

OpenAI Privacy Filter (accessed via PrivacyFilter.run) detects 10 entity types out of the box. This reference page documents each type with real-world examples, edge cases, and notes on multilingual detection.

PERSON

Full names, first names, last names, nicknames — in any context: "my colleague John", "sincerely, Maria Rossi", "patient: James Doe".

Examples: "Alex Tan" · "Dr. Sarah Williams" · "my boss Marco" · "署名：田中太郎"

EMAIL

Any RFC-5321-valid email address, including subaddressed forms (+tag) and unusual TLDs. Also catches obfuscated forms like "alex at acme dot com" in surrounding text context.

Examples: "alex@acme.com" · "user+newsletter@company.io" · "support@wpsani.store"

PHONE

Domestic and international phone numbers in any common format: E.164, NANP, European, with or without separators. Catches extensions (ext. 104).

Examples: "+1 555-867-5309" · "06 1234 5678" · "(0039) 02 12345678" · "0044 7911 123456"

ADDRESS

Street addresses, house numbers, zip/postal codes, cities, and full address blocks. Detected contextually — "lives at 42 Baker Street, London SW1A 1AA" is one ADDRESS entity spanning the full location string.

Examples: "742 Evergreen Terrace, Springfield, IL 62701" · "Via Roma 15, 10122 Torino"

SSN

US Social Security Numbers in XXX-XX-XXXX and unformatted variants. Also catches other national ID numbers from context (Italian codice fiscale, German Personalausweis number, UK NIN) when context makes the type clear.

Examples: "123-45-6789" · "SSN: 078051120" · "CF: RSSMRA85M01F205D"

DATE_OF_BIRTH

Dates that appear in a birth-context: "born on", "DOB:", "date of birth", age combined with birthdate. Standalone dates (meeting times, publication dates) are not flagged.

Examples: "born March 12, 1985" · "DOB: 1990-07-04" · "date of birth: 15/08/1978"

CREDIT_CARD

16-digit card numbers (Visa, Mastercard), 15-digit Amex, with or without spaces/hyphens. Catches partial redactions ("ending in 4242") as contextual PII.

Examples: "4111 1111 1111 1111" · "3782 822463 10005" · "card ending 4242"

IP_ADDRESS

IPv4 and IPv6 addresses. Internal ranges (192.168.x.x, 10.x.x.x) are still flagged — they can identify internal users in log files.

Examples: "192.168.1.42" · "2001:0db8:85a3::8a2e:0370:7334" · "logged from 203.0.113.5"

URL

Full URLs including query strings and fragments. Catches URLs that may encode PII (e.g., OAuth redirect URIs with email= parameters, Calendly links with names).

Examples: "https://meet.example.com/maria-rossi" · "visit my profile at linkedin.com/in/alextan"

OTHER

Catch-all for identifiable information that doesn't fit standard types: medical record numbers, employee IDs, passport numbers, IBAN/account numbers, vehicle plate numbers, or any string the model judges as uniquely identifying an individual.

Examples: "IBAN IT60X054281101000000123456" · "plate VR 123 AB" · "MRN: 8472910"

What the model does NOT flag

Generic dates (meeting times, deadlines, publication dates) — only birth-context dates
Common first names used generically ("call the john doe function")
Company names, product names, or organizational titles
Numeric strings without identifiable context (order numbers, ticket IDs)

Edge cases and known limitations

Overlapping entities: "maria@example.com" could match both EMAIL and part of PERSON. The API resolves overlaps, keeping the highest-confidence entity. Inspect start/end offsets to check.
Very short names: Two-letter names ("Li", "Al") in isolation may not be flagged without surrounding context.
Fictional characters: Literary or movie character names in clearly fictional contexts may be missed — the model considers context, not just patterns.
Non-Latin scripts: Arabic, Chinese, Japanese, and Cyrillic names and addresses are detected, but accuracy is lower than for Latin-script languages.

HIPAA special categories

HIPAA's 18 PHI identifiers overlap heavily with the types above (names, phone, email, address, DOB, SSN, IP address, URLs). The main gaps are: geographic subdivisions smaller than state, account/certificate numbers, health plan numbers, and medical device identifiers. These will typically fall into OTHER if context is clear, or may be missed. For HIPAA-critical workflows, add a secondary pass with pattern matching for these remaining identifiers.

See entity detection live — paste any text at PrivacyFilter.run and see color-coded entities in seconds.

Try free — 3 redactions/day, no account needed →