OpenAI Privacy Filter — languages supported (2026)
OpenAI Privacy Filter launched with a strong English focus, but many teams process multilingual content: support tickets from European markets, documents from Latin American customers, or logs that mix two languages in the same line. This page covers exactly what language support looks like today, where accuracy degrades, and what you can do about it.
Language support tiers
| Language | Support level | Notes |
|---|---|---|
| English | Strong | Primary training language; highest accuracy across all entity types |
| Spanish | Experimental | Good for PERSON and EMAIL; DNI/NIE format detection is inconsistent |
| French | Experimental | Works for informal PII; numéro de sécurité sociale (INSEE) sometimes missed |
| German | Experimental | PERSON detection solid; Steuernummer format support is partial |
| Italian | Experimental | Codice Fiscale detection works in most cases |
| Portuguese | Experimental | Brazilian CPF/CNPJ partially detected; European PT support weaker |
| Japanese / Chinese / Korean | Limited | Name and address detection inconsistent; not recommended for production use |
| Arabic / Russian / Hindi | Limited | Low accuracy; use a self-hosted alternative for these locales |
The underlying model is LLM-based, which means it understands language context rather than matching fixed patterns. This gives it an advantage over regex-based detectors for European languages where personal names and local ID formats appear in natural prose. But it also means accuracy is higher for languages well-represented in LLM training data (primarily English and major European languages).
How multi-language input is handled
You don't need to specify a language parameter — the model auto-detects the input language. If your text mixes languages (code-switching), the model handles it better than regex-based tools because it reads the full sentence context:
import httpx
# Mixed Italian/English — correctly flags both names
resp = httpx.post(
"https://privacyfilter.run/api/redact",
json={"text": "Ciao, sono Marco Rossi. My colleague is Sarah Johnson at sarah@acme.com",
"license_key": "your-key"}
)
print(resp.json()["redacted_text"])
# → "Ciao, sono [PERSON_1]. My colleague is [PERSON_2] at [EMAIL_3]"
Structured PII formats by country
The gap between "experimental" and "strong" support is most visible for government-issued ID numbers, which have country-specific formats:
- US SSN (
XXX-XX-XXXX): strong detection - UK NIN (
AA NNNNNN A): moderate; often flagged as OTHER not SSN - Italian Codice Fiscale (16-char alphanumeric): mostly detected
- German Steuernummer (10–11 digit): partial
- Spanish DNI (8 digits + letter): inconsistent
- Brazilian CPF (
XXX.XXX.XXX-XX): partial - French INSEE (15 digit): sometimes missed in running text
For production processing of non-English ID documents, supplement with a dedicated national-format regex layer. The full entity types reference explains how each type is classified.
Workaround for unsupported languages
If you need reliable PII detection in Japanese, Arabic, Russian, or Hindi, the recommended pattern is to use an LLM-based translation step before redaction — or switch to a self-hosted tool with proper multilingual support like Microsoft Presidio with a matching spaCy language model.
import httpx
def redact_non_english(text: str, license_key: str) -> dict:
# Step 1: translate to English via an LLM (Claude, GPT-4o, etc.)
# Step 2: redact the English version
# Step 3: map entity offsets back to original text (optional)
en_text = translate_to_english(text) # your translation function
return httpx.post(
"https://privacyfilter.run/api/redact",
json={"text": en_text, "license_key": license_key}
).json()
FAQ
Does OpenAI Privacy Filter support languages other than English?
Yes, with caveats. It has strong English support and experimental multilingual capability for major European languages. Accuracy degrades for structured PII formats (phone numbers, national IDs) that differ from English conventions.
Can I use it for mixed-language documents?
Yes. Because the model is LLM-based, it handles code-switching (text that mixes two languages) better than regex-based tools. Accuracy on non-English portions depends on the language.
Will more languages be added?
OpenAI Privacy Filter was released in April 2026 and language coverage is expected to expand as the model evolves. Check the guide for updates.
Test language support for your use case — paste a sample of your text and see what entities are detected.