← Blog · April 28, 2026 · 8 min read

OpenAI Privacy Filter vs Microsoft Presidio: which PII redactor wins in 2026?

Two tools dominate the PII redaction conversation in 2026: OpenAI Privacy Filter, a hosted API released in April 2026, and Microsoft Presidio, the de-facto open-source standard since 2020. They solve the same problem — strip personally identifiable information from text — but with completely different philosophies.

This comparison is based on both tools as they stand today: OpenAI Privacy Filter v1 (April 2026) and Presidio 2.2.x.

TL;DR comparison table

Dimension	OpenAI Privacy Filter	Microsoft Presidio
Deployment	Hosted API	Self-hosted only
Setup time	5 minutes	30–120 minutes
Contextual PII ("John from HR")	Yes (LLM-based)	Partial (spaCy NER)
Languages	30+ (multilingual)	English primary
Custom recognizers	Not yet	Yes (regex + ML)
Offline / air-gap	No	Yes
Pricing	API usage-based	Free (OSS)
HIPAA / FedRAMP	via OpenAI BAA	Self-controlled
Integration complexity	1 HTTP call	Python pipeline
Entity output format	JSON offsets	Python objects

Detection accuracy: the key difference

Presidio uses a pipeline of regex recognizers, spaCy NER, and optional ML models. It's excellent at structured PII — emails, phone numbers, credit cards — because regex is deterministic. But it struggles with contextual PII: "schedule a meeting with Sarah from legal" will not trigger the PERSON recognizer unless spaCy's NER fires, which is confidence-threshold dependent and English-centric.

OpenAI Privacy Filter uses the full LLM context window. It reads your sentence the way a human does, so "my colleague Alex", "the patient mentioned above", and "her home address in Milan" all register correctly as PII. In internal testing with 500 support-ticket snippets, the hosted model caught ~38% more contextual names than Presidio 2.2 with default settings.

The tradeoff: LLM-based detection is non-deterministic. Identical text may produce slightly different offsets across API calls (rare, but possible). For legal or compliance-critical use cases, you may want to run both and union the results.

Setup and integration

OpenAI Privacy Filter (via PrivacyFilter.run)

import httpx

resp = httpx.post(
    "https://privacyfilter.run/api/redact",
    json={"text": "Contact John Doe at john@corp.com",
          "license_key": "your-uuid-here"}
)
print(resp.json()["redacted_text"])
# → "Contact [PERSON_1] at [EMAIL_2]"

Microsoft Presidio

pip install presidio-analyzer presidio-anonymizer
python -m spacy download en_core_web_lg

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

results = analyzer.analyze(
    text="Contact John Doe at john@corp.com",
    language="en"
)
output = anonymizer.anonymize(
    text="Contact John Doe at john@corp.com",
    analyzer_results=results
)
print(output.text)
# → "Contact <PERSON> at <EMAIL_ADDRESS>"

The Presidio snippet above assumes en_core_web_lg is already downloaded. In Docker, that adds ~700 MB to your image. The hosted API adds zero infrastructure overhead.

Supported entity types

Both tools cover the core set: names, emails, phones, addresses, SSNs, credit cards, dates of birth, IP addresses. Key differences:

Presidio has a richer catalog of jurisdiction-specific identifiers (UK NIN, DE Personalausweis, AU TFN, etc.) via contributed recognizers.
OpenAI Privacy Filter covers the global common set out of the box and infers jurisdiction-specific patterns from context ("UK National Insurance number ending in…") without explicit recognizers.
For HIPAA-specific entities (MRN, DEA, NPI), Presidio has dedicated recognizers; Privacy Filter catches them as OTHER with the original text included.

Multilingual support

Presidio officially supports English, Spanish, German, and Dutch with maintained recognizers. Community contributions cover 10+ more, but quality varies. OpenAI Privacy Filter is multilingual by design — tested reliably on French, Italian, German, Portuguese, Japanese, and Chinese. This is a significant win for teams processing user-generated content from non-English markets.

Cost comparison

Presidio is free to run but costs you in infrastructure: a containerized analyzer service needs at least 1 GB RAM, ~1 vCPU, and ongoing maintenance. At $20/mo for a small DO droplet, plus developer time for spaCy model updates, real-world TCO is non-trivial.

OpenAI Privacy Filter charges per call via the underlying OpenAI API (or via wrappers like PrivacyFilter.run: $9 for 50 redactions, $19/mo unlimited). For teams processing under a few thousand documents per month, the hosted option is often cheaper and always faster to ship.

When to choose Presidio

You need air-gapped or on-premises deployment (regulated industries, government)
You need custom recognizers for proprietary entity types (internal IDs, medical codes)
You process millions of documents per month and need to optimize API cost
Your primary PII is structured (emails, phone numbers, credit cards) — regex is more reliable here

When to choose OpenAI Privacy Filter

You need contextual PII detection (names in natural prose, addresses described in text)
You're processing multilingual content
You want to ship in hours, not days
You're pre-scrubbing prompts before sending to another LLM

Try PrivacyFilter free — no account, no credit card, 3 redactions/day.

Paste text and see entities detected in seconds →