← Blog · April 28, 2026 · 6 min read

How to anonymize text before sending it to ChatGPT

Every time you paste a support ticket, a user email, or a document into ChatGPT, you are potentially sending names, phone numbers, email addresses, and other PII to OpenAI's servers. For personal use, that might be fine. For any business workflow, it almost certainly isn't.

This guide covers the practical workflow for stripping PII from text before it touches any LLM API — with code examples you can drop into a Python pipeline today.

Why this matters legally

Under GDPR (EU) and CCPA (California), sending personal data to a third-party processor requires a legal basis and, in many cases, a Data Processing Agreement (DPA). OpenAI offers a DPA for API customers, but it only covers data sent through the API — not data pasted into ChatGPT's web UI by employees using personal accounts.

The common violation pattern: An employee copies a customer complaint (containing name, email, and order details) into ChatGPT to get a draft reply. No DPA covers this transfer. If that customer is an EU resident, you have a potential GDPR breach.

The cleanest fix: strip PII before the text ever leaves your infrastructure.

The anonymization workflow

The pattern is three steps:

Detect PII entities and their character offsets
Replace each entity with a placeholder ([PERSON_1], [EMAIL_2], etc.)
Send the scrubbed text to the LLM; optionally re-insert names in the LLM's output

Step 3 is optional but powerful: if you keep a mapping of [PERSON_1] → "Alice", you can replace placeholders back into the LLM response, giving the user a personalized answer without ever exposing PII to the model.

Implementation with PrivacyFilter API

import httpx
import re

LICENSE_KEY = "your-uuid-here"  # get at privacyfilter.run

def scrub_and_prompt(raw_text: str, user_question: str) -> str:
    # 1. Redact PII
    r = httpx.post(
        "https://privacyfilter.run/api/redact",
        json={"text": raw_text, "license_key": LICENSE_KEY, "mode": "replace"},
        timeout=15,
    ).raise_for_status().json()

    clean_text = r["redacted_text"]
    entity_map = {e["replacement"]: e["original"] for e in r["entities"]}

    # 2. Send to ChatGPT with scrubbed text
    import openai
    client = openai.OpenAI()
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer the user's question based on the provided text."},
            {"role": "user",   "content": f"Text: {clean_text}\n\nQuestion: {user_question}"},
        ],
    )
    answer = completion.choices[0].message.content

    # 3. Re-insert original values (optional)
    for placeholder, original in entity_map.items():
        answer = answer.replace(placeholder, original)

    return answer

# Example
ticket = "Hi, I'm Maria Rossi (maria@example.com). My order #4521 hasn't arrived."
reply = scrub_and_prompt(ticket, "Write a polite reply acknowledging the delay.")
print(reply)

The LLM sees: "Hi, I'm [PERSON_1] ([EMAIL_2]). My order #4521 hasn't arrived." — no PII exposed. The final reply re-inserts "Maria Rossi" and "maria@example.com" automatically.

Free-tier shortcut (no API key)

If you just need a quick scrub, paste your text into privacyfilter.run — no account needed, 3 free redactions per day. Copy the redacted output, paste into ChatGPT, done.

Handling PII in fine-tuning datasets

Fine-tuning a model on customer conversations? The same pattern applies at batch scale. Use the /api/redact/batch endpoint (paid plans) to process up to 20 documents per call:

import httpx, json

documents = [
    {"id": "t1", "text": "Customer Alice Walker called about invoice 9912..."},
    {"id": "t2", "text": "Support case from bob@widgets.io regarding..."},
]

r = httpx.post(
    "https://privacyfilter.run/api/redact/batch",
    json={"documents": documents, "license_key": LICENSE_KEY, "mode": "replace"},
    timeout=60,
).raise_for_status().json()

for item in r["results"]:
    print(item["id"], "→", item["redacted_text"][:80])

What about images and PDFs?

PrivacyFilter (and OpenAI Privacy Filter) operate on plain text. For PDFs, extract text with pdfplumber or pypdf first, redact, then re-generate the document. For images containing text (screenshots, scanned forms), use an OCR step (Tesseract, AWS Textract) before the redaction API.

Checklist before deploying to production

Add the redaction step as middleware in your LLM wrapper — don't rely on developers remembering it
Log entity counts (not the text itself) for compliance auditing
Use mode=replace (not mask) if you need to re-insert values downstream
Review your OpenAI DPA — it covers API but not web UI usage by employees
Add PrivacyFilter as a sub-processor in your privacy policy if you use the hosted API

Try PrivacyFilter free — paste any text and get a clean, PII-free version in under 2 seconds.

No account · No credit card · 3 free redactions/day →

How to anonymize text before sending it to ChatGPT

Why this matters legally

The anonymization workflow

Implementation with PrivacyFilter API

Free-tier shortcut (no API key)

Handling PII in fine-tuning datasets

What about images and PDFs?

Checklist before deploying to production

Keep reading