← Blog · April 28, 2026 · 6 min read

OpenAI Privacy Filter — languages supported (2026)

Q: Does OpenAI Privacy Filter support languages other than English?

Yes, with caveats. OpenAI Privacy Filter has strong English support and experimental multilingual capability for major European languages (Spanish, French, German, Italian, Portuguese). Accuracy degrades for structured PII (phone formats, ID numbers) that differ from English conventions.

OpenAI Privacy Filter launched with a strong English focus, but many teams process multilingual content: support tickets from European markets, documents from Latin American customers, or logs that mix two languages in the same line. This page covers exactly what language support looks like today, where accuracy degrades, and what you can do about it.

Language support tiers

Language	Support level	Notes
English	Strong	Primary training language; highest accuracy across all entity types
Spanish	Experimental	Good for PERSON and EMAIL; DNI/NIE format detection is inconsistent
French	Experimental	Works for informal PII; numéro de sécurité sociale (INSEE) sometimes missed
German	Experimental	PERSON detection solid; Steuernummer format support is partial
Italian	Experimental	Codice Fiscale detection works in most cases
Portuguese	Experimental	Brazilian CPF/CNPJ partially detected; European PT support weaker
Japanese / Chinese / Korean	Limited	Name and address detection inconsistent; not recommended for production use
Arabic / Russian / Hindi	Limited	Low accuracy; use a self-hosted alternative for these locales

The underlying model is LLM-based, which means it understands language context rather than matching fixed patterns. This gives it an advantage over regex-based detectors for European languages where personal names and local ID formats appear in natural prose. But it also means accuracy is higher for languages well-represented in LLM training data (primarily English and major European languages).

How multi-language input is handled

You don't need to specify a language parameter — the model auto-detects the input language. If your text mixes languages (code-switching), the model handles it better than regex-based tools because it reads the full sentence context:

import httpx

# Mixed Italian/English — correctly flags both names
resp = httpx.post(
    "https://privacyfilter.run/api/redact",
    json={"text": "Ciao, sono Marco Rossi. My colleague is Sarah Johnson at sarah@acme.com",
          "license_key": "your-key"}
)
print(resp.json()["redacted_text"])
# → "Ciao, sono [PERSON_1]. My colleague is [PERSON_2] at [EMAIL_3]"

Structured PII formats by country

The gap between "experimental" and "strong" support is most visible for government-issued ID numbers, which have country-specific formats:

US SSN (XXX-XX-XXXX): strong detection
UK NIN (AA NNNNNN A): moderate; often flagged as OTHER not SSN
Italian Codice Fiscale (16-char alphanumeric): mostly detected
German Steuernummer (10–11 digit): partial
Spanish DNI (8 digits + letter): inconsistent
Brazilian CPF (XXX.XXX.XXX-XX): partial
French INSEE (15 digit): sometimes missed in running text

For production processing of non-English ID documents, supplement with a dedicated national-format regex layer. The full entity types reference explains how each type is classified.

Workaround for unsupported languages

If you need reliable PII detection in Japanese, Arabic, Russian, or Hindi, the recommended pattern is to use an LLM-based translation step before redaction — or switch to a self-hosted tool with proper multilingual support like Microsoft Presidio with a matching spaCy language model.

import httpx

def redact_non_english(text: str, license_key: str) -> dict:
    # Step 1: translate to English via an LLM (Claude, GPT-4o, etc.)
    # Step 2: redact the English version
    # Step 3: map entity offsets back to original text (optional)
    en_text = translate_to_english(text)  # your translation function
    return httpx.post(
        "https://privacyfilter.run/api/redact",
        json={"text": en_text, "license_key": license_key}
    ).json()

FAQ

Does OpenAI Privacy Filter support languages other than English?

Yes, with caveats. It has strong English support and experimental multilingual capability for major European languages. Accuracy degrades for structured PII formats (phone numbers, national IDs) that differ from English conventions.

Can I use it for mixed-language documents?

Yes. Because the model is LLM-based, it handles code-switching (text that mixes two languages) better than regex-based tools. Accuracy on non-English portions depends on the language.

Will more languages be added?

OpenAI Privacy Filter was released in April 2026 and language coverage is expected to expand as the model evolves. Check the guide for updates.

Test language support for your use case — paste a sample of your text and see what entities are detected.

Try free — no account required →