7 real use cases for OpenAI Privacy Filter
Since OpenAI Privacy Filter launched in April 2026, developers have been adapting it to a wide range of workflows that share one underlying need: removing personally identifiable information before text moves somewhere it shouldn't go. Here are seven concrete patterns with code, risk context, and notes on where each one works best.
Pre-LLM prompt scrubbing
Problem: Your app sends user-supplied text to ChatGPT, Claude, or Gemini. If a user pastes their CV, a support transcript, or a client email, that text contains names, phone numbers, and email addresses. Sending it to a third-party LLM API is a data processing act under GDPR and CCPA — and it may violate your privacy policy or your customers' DPAs.
Pattern: Intercept user input on the backend before forwarding to the LLM. Redact PII. Forward the clean text. Log a mapping of placeholder → original if you need to reconstruct the output.
async def send_to_llm(user_text: str, license_key: str) -> str:
# 1. Redact PII
redact = await httpx.AsyncClient().post(
"https://privacyfilter.run/api/redact",
json={"text": user_text, "license_key": license_key}
)
clean = redact.json()["redacted_text"]
# 2. Send clean text to LLM
return await call_llm(clean)
Customer support log anonymization before analytics
Problem: Your support team processes hundreds of tickets per day. You want to feed them into a topic-modeling pipeline, a BI dashboard, or a third-party analytics tool — but the raw logs contain customer names, emails, and account numbers.
Pattern: Run a nightly batch job that redacts all tickets created in the past 24 hours before writing them to the analytics store. Keep the raw tickets in a restricted-access data warehouse; expose only the redacted copy to BI tools.
from datetime import date, timedelta
import httpx, sqlite3
def nightly_anonymize(db_path: str, license_key: str):
con = sqlite3.connect(db_path)
yesterday = (date.today() - timedelta(days=1)).isoformat()
rows = con.execute(
"SELECT id, body FROM tickets WHERE created_date = ?", (yesterday,)
).fetchall()
for ticket_id, body in rows:
r = httpx.post("https://privacyfilter.run/api/redact",
json={"text": body, "license_key": license_key})
anon = r.json()["redacted_text"]
con.execute("UPDATE tickets SET body_anon = ? WHERE id = ?",
(anon, ticket_id))
con.commit()
See also: full guide to redacting customer support logs before LLM fine-tuning.
Fine-tuning dataset preparation
Problem: You're preparing a custom training dataset from real conversations, emails, or documents. If PII is present in training data, the model can memorize and later reproduce it — a well-documented attack vector for extracting personal data from fine-tuned models.
Pattern: Run every training example through the Privacy Filter before adding it to the dataset. Replace entities with consistent synthetic placeholders ([PERSON_1], [EMAIL_2]) so the model still learns from the surrounding sentence structure.
import json, httpx
def prepare_dataset(raw_examples: list[str], license_key: str) -> list[str]:
clean = []
for text in raw_examples:
r = httpx.post("https://privacyfilter.run/api/redact",
json={"text": text, "license_key": license_key})
clean.append(r.json()["redacted_text"])
return clean
with open("train.jsonl", "w") as f:
for text in prepare_dataset(raw_data, license_key):
f.write(json.dumps({"text": text}) + "\n")
GDPR Article 17 data erasure — finding personal data in text fields
Problem: When a user submits a GDPR right-to-erasure request, you need to find and delete all copies of their data. Structured fields (user tables, purchase history) are straightforward. Free-text fields (support notes, comments, form submissions) are harder — you can't SQL-query for a name buried in a paragraph.
Pattern: Run the Privacy Filter across all free-text fields and build an index of which records contain identifiable entities. When an erasure request comes in, use the index to find candidate records and do a secondary review before deletion.
Protecting PII in CI/CD logs and error traces
Problem: Your application logs contain stack traces. Stack traces contain user inputs. User inputs contain email addresses, names, and sometimes credit card numbers that were submitted in forms. These logs end up in Datadog, Splunk, or S3 — and log access is often much less restricted than production databases.
Pattern: Add a log handler that passes the log line through Privacy Filter before writing it to the aggregator. For high-volume logging, use a sampling strategy or a lighter regex pre-filter to reduce API calls.
import logging, httpx
class PiiRedactingHandler(logging.Handler):
def __init__(self, downstream: logging.Handler, license_key: str):
super().__init__()
self.downstream = downstream
self.key = license_key
def emit(self, record: logging.LogRecord):
msg = self.format(record)
try:
r = httpx.post("https://privacyfilter.run/api/redact",
json={"text": msg, "license_key": self.key},
timeout=2)
record.msg = r.json().get("redacted_text", msg)
except Exception:
pass # fail open — log original on API error
self.downstream.emit(record)
Internal knowledge base sanitization
Problem: Your team uses a Notion or Confluence workspace to document internal processes, client interactions, and project notes. Over time, client names, contract values, and personal contacts accumulate in page text. When you want to use this knowledge base to build a RAG (Retrieval-Augmented Generation) chatbot, you can't let it retrieve and serve up client PII to any employee who asks.
Pattern: Export the knowledge base periodically. Run each page through Privacy Filter to create a sanitized copy. Index only the sanitized copy in your RAG vector store. Keep the original accessible only to authorized team members.
Real-time chat moderation and redaction
Problem: A community platform or customer chat tool wants to warn users when they're about to share personal information in a public channel, or automatically redact it in a semi-public support forum.
Pattern: Intercept outgoing messages client-side (or server-side before broadcast), call the Privacy Filter API, and either warn the user ("This message contains an email address — are you sure you want to share it?") or auto-redact before storing and broadcasting.
async function beforeSend(message: string): Promise {
const resp = await fetch("https://privacyfilter.run/api/redact", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ text: message })
});
const data = await resp.json();
if (data.entities && data.entities.length > 0) {
const types = data.entities.map((e: any) => e.type).join(", ");
const confirmed = confirm(`This message contains: ${types}. Share anyway?`);
return confirmed ? message : data.redacted_text;
}
return message;
}
For a fuller picture of what the tool can and can't detect, see the complete entity types reference and the accuracy benchmark.
Ready to protect your pipeline? Try PrivacyFilter.run free — 3 redactions/day, no account required.