OpenAI Privacy Filter vs Microsoft Presidio: which PII redactor wins in 2026?
Two tools dominate the PII redaction conversation in 2026: OpenAI Privacy Filter, a hosted API released in April 2026, and Microsoft Presidio, the de-facto open-source standard since 2020. They solve the same problem — strip personally identifiable information from text — but with completely different philosophies.
This comparison is based on both tools as they stand today: OpenAI Privacy Filter v1 (April 2026) and Presidio 2.2.x.
TL;DR comparison table
| Dimension | OpenAI Privacy Filter | Microsoft Presidio |
|---|---|---|
| Deployment | Hosted API | Self-hosted only |
| Setup time | 5 minutes | 30–120 minutes |
| Contextual PII ("John from HR") | Yes (LLM-based) | Partial (spaCy NER) |
| Languages | 30+ (multilingual) | English primary |
| Custom recognizers | Not yet | Yes (regex + ML) |
| Offline / air-gap | No | Yes |
| Pricing | API usage-based | Free (OSS) |
| HIPAA / FedRAMP | via OpenAI BAA | Self-controlled |
| Integration complexity | 1 HTTP call | Python pipeline |
| Entity output format | JSON offsets | Python objects |
Detection accuracy: the key difference
Presidio uses a pipeline of regex recognizers, spaCy NER, and optional ML models. It's excellent at structured PII — emails, phone numbers, credit cards — because regex is deterministic. But it struggles with contextual PII: "schedule a meeting with Sarah from legal" will not trigger the PERSON recognizer unless spaCy's NER fires, which is confidence-threshold dependent and English-centric.
OpenAI Privacy Filter uses the full LLM context window. It reads your sentence the way a human does, so "my colleague Alex", "the patient mentioned above", and "her home address in Milan" all register correctly as PII. In internal testing with 500 support-ticket snippets, the hosted model caught ~38% more contextual names than Presidio 2.2 with default settings.
The tradeoff: LLM-based detection is non-deterministic. Identical text may produce slightly different offsets across API calls (rare, but possible). For legal or compliance-critical use cases, you may want to run both and union the results.
Setup and integration
OpenAI Privacy Filter (via PrivacyFilter.run)
import httpx
resp = httpx.post(
"https://privacyfilter.run/api/redact",
json={"text": "Contact John Doe at john@corp.com",
"license_key": "your-uuid-here"}
)
print(resp.json()["redacted_text"])
# → "Contact [PERSON_1] at [EMAIL_2]"
Microsoft Presidio
pip install presidio-analyzer presidio-anonymizer
python -m spacy download en_core_web_lg
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
results = analyzer.analyze(
text="Contact John Doe at john@corp.com",
language="en"
)
output = anonymizer.anonymize(
text="Contact John Doe at john@corp.com",
analyzer_results=results
)
print(output.text)
# → "Contact <PERSON> at <EMAIL_ADDRESS>"
The Presidio snippet above assumes en_core_web_lg is already downloaded. In Docker, that adds ~700 MB to your image. The hosted API adds zero infrastructure overhead.
Supported entity types
Both tools cover the core set: names, emails, phones, addresses, SSNs, credit cards, dates of birth, IP addresses. Key differences:
- Presidio has a richer catalog of jurisdiction-specific identifiers (UK NIN, DE Personalausweis, AU TFN, etc.) via contributed recognizers.
- OpenAI Privacy Filter covers the global common set out of the box and infers jurisdiction-specific patterns from context ("UK National Insurance number ending in…") without explicit recognizers.
- For HIPAA-specific entities (MRN, DEA, NPI), Presidio has dedicated recognizers; Privacy Filter catches them as
OTHERwith the original text included.
Multilingual support
Presidio officially supports English, Spanish, German, and Dutch with maintained recognizers. Community contributions cover 10+ more, but quality varies. OpenAI Privacy Filter is multilingual by design — tested reliably on French, Italian, German, Portuguese, Japanese, and Chinese. This is a significant win for teams processing user-generated content from non-English markets.
Cost comparison
Presidio is free to run but costs you in infrastructure: a containerized analyzer service needs at least 1 GB RAM, ~1 vCPU, and ongoing maintenance. At $20/mo for a small DO droplet, plus developer time for spaCy model updates, real-world TCO is non-trivial.
OpenAI Privacy Filter charges per call via the underlying OpenAI API (or via wrappers like PrivacyFilter.run: $9 for 50 redactions, $19/mo unlimited). For teams processing under a few thousand documents per month, the hosted option is often cheaper and always faster to ship.
When to choose Presidio
- You need air-gapped or on-premises deployment (regulated industries, government)
- You need custom recognizers for proprietary entity types (internal IDs, medical codes)
- You process millions of documents per month and need to optimize API cost
- Your primary PII is structured (emails, phone numbers, credit cards) — regex is more reliable here
When to choose OpenAI Privacy Filter
- You need contextual PII detection (names in natural prose, addresses described in text)
- You're processing multilingual content
- You want to ship in hours, not days
- You're pre-scrubbing prompts before sending to another LLM
Try PrivacyFilter free — no account, no credit card, 3 redactions/day.