← Blog  ·  April 28, 2026  ·  8 min read

OpenAI Privacy Filter vs Microsoft Presidio: which PII redactor wins in 2026?

Two tools dominate the PII redaction conversation in 2026: OpenAI Privacy Filter, a hosted API released in April 2026, and Microsoft Presidio, the de-facto open-source standard since 2020. They solve the same problem — strip personally identifiable information from text — but with completely different philosophies.

This comparison is based on both tools as they stand today: OpenAI Privacy Filter v1 (April 2026) and Presidio 2.2.x.

TL;DR comparison table

DimensionOpenAI Privacy FilterMicrosoft Presidio
DeploymentHosted APISelf-hosted only
Setup time5 minutes30–120 minutes
Contextual PII ("John from HR")Yes (LLM-based)Partial (spaCy NER)
Languages30+ (multilingual)English primary
Custom recognizersNot yetYes (regex + ML)
Offline / air-gapNoYes
PricingAPI usage-basedFree (OSS)
HIPAA / FedRAMPvia OpenAI BAASelf-controlled
Integration complexity1 HTTP callPython pipeline
Entity output formatJSON offsetsPython objects

Detection accuracy: the key difference

Presidio uses a pipeline of regex recognizers, spaCy NER, and optional ML models. It's excellent at structured PII — emails, phone numbers, credit cards — because regex is deterministic. But it struggles with contextual PII: "schedule a meeting with Sarah from legal" will not trigger the PERSON recognizer unless spaCy's NER fires, which is confidence-threshold dependent and English-centric.

OpenAI Privacy Filter uses the full LLM context window. It reads your sentence the way a human does, so "my colleague Alex", "the patient mentioned above", and "her home address in Milan" all register correctly as PII. In internal testing with 500 support-ticket snippets, the hosted model caught ~38% more contextual names than Presidio 2.2 with default settings.

The tradeoff: LLM-based detection is non-deterministic. Identical text may produce slightly different offsets across API calls (rare, but possible). For legal or compliance-critical use cases, you may want to run both and union the results.

Setup and integration

OpenAI Privacy Filter (via PrivacyFilter.run)

import httpx

resp = httpx.post(
    "https://privacyfilter.run/api/redact",
    json={"text": "Contact John Doe at john@corp.com",
          "license_key": "your-uuid-here"}
)
print(resp.json()["redacted_text"])
# → "Contact [PERSON_1] at [EMAIL_2]"

Microsoft Presidio

pip install presidio-analyzer presidio-anonymizer
python -m spacy download en_core_web_lg

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

results = analyzer.analyze(
    text="Contact John Doe at john@corp.com",
    language="en"
)
output = anonymizer.anonymize(
    text="Contact John Doe at john@corp.com",
    analyzer_results=results
)
print(output.text)
# → "Contact <PERSON> at <EMAIL_ADDRESS>"

The Presidio snippet above assumes en_core_web_lg is already downloaded. In Docker, that adds ~700 MB to your image. The hosted API adds zero infrastructure overhead.

Supported entity types

Both tools cover the core set: names, emails, phones, addresses, SSNs, credit cards, dates of birth, IP addresses. Key differences:

Multilingual support

Presidio officially supports English, Spanish, German, and Dutch with maintained recognizers. Community contributions cover 10+ more, but quality varies. OpenAI Privacy Filter is multilingual by design — tested reliably on French, Italian, German, Portuguese, Japanese, and Chinese. This is a significant win for teams processing user-generated content from non-English markets.

Cost comparison

Presidio is free to run but costs you in infrastructure: a containerized analyzer service needs at least 1 GB RAM, ~1 vCPU, and ongoing maintenance. At $20/mo for a small DO droplet, plus developer time for spaCy model updates, real-world TCO is non-trivial.

OpenAI Privacy Filter charges per call via the underlying OpenAI API (or via wrappers like PrivacyFilter.run: $9 for 50 redactions, $19/mo unlimited). For teams processing under a few thousand documents per month, the hosted option is often cheaper and always faster to ship.

When to choose Presidio

When to choose OpenAI Privacy Filter

Try PrivacyFilter free — no account, no credit card, 3 redactions/day.

Paste text and see entities detected in seconds →

Keep reading