← Blog  ·  April 28, 2026  ·  9 min read

GDPR text anonymization — compliance checklist for AI tooling

Anonymization is one of GDPR's most powerful exemptions. Truly anonymous data falls outside the regulation entirely — you can process it freely, store it indefinitely, and share it without restriction. But GDPR sets a high bar: anonymization must be irreversible. Pseudonymization (replacing names with tokens you can reverse) is explicitly not anonymization under GDPR.

This matters enormously for AI teams. If you redact PII before sending text to an LLM API, you need to understand whether that redaction is anonymization or pseudonymization — and the answer affects your legal obligations significantly.

Anonymization vs pseudonymization: the legal distinction

The Article 29 Working Party (now EDPB) Opinion 05/2014 defines anonymization as a process that "irreversibly prevents identification." Concretely:

Most PII redaction workflows used with LLMs are pseudonymization by this definition, because teams keep the original for later re-insertion. That's fine — it's a legitimate GDPR safeguard — but don't claim it's "anonymized data" in your privacy policy.

The three anonymization tests (EDPB)

For a dataset to be considered anonymous, it must pass all three:

  1. Singling out: Can you isolate an individual in the dataset? (e.g., "the person with SSN 123-45-6789")
  2. Linkability: Can you link two records across datasets to identify the same person?
  3. Inference: Can you infer sensitive attributes about an individual from remaining data?

PII redaction of names and contact details usually passes tests 1 and 2 but may fail test 3 if sensitive attributes (health condition, salary, location history) remain in the text.

Practical checklist for AI tooling

Before processing

During redaction

After processing

Special categories (Article 9)

Health data, political opinions, religious beliefs, sexual orientation, and racial/ethnic origin require explicit consent or specific derogations. If your text may contain these categories (e.g., medical records, HR communications), your redaction must be more thorough:

Redacting a name but leaving "patient's HIV diagnosis" in a support ticket does not anonymize the individual. Inference-risk must be assessed holistically, not just by counting PII fields removed.

OpenAI Privacy Filter (and PrivacyFilter's API) detects names, contacts, IDs, and financial data — but does not automatically classify inferred sensitive attributes. For Article 9 content, consider adding domain-specific keyword suppression on top of entity detection.

Retention and the right to erasure

If you use a hosted API that does not retain text (like PrivacyFilter — see our privacy policy), the input text effectively disappears after the API call. You still need to manage:

CCPA parallel

California's CCPA and CPRA use the term "deidentified" rather than anonymized, but the principle is similar: data is deidentified if there is no "reasonable basis to believe that the information can be used to identify an individual." The technical threshold is comparable to GDPR's anonymization bar. If you're GDPR-compliant on anonymization, you're generally CCPA-compliant on deidentification.

Start redacting PII today — paste any text into PrivacyFilter and see detected entities in seconds.

No account · No credit card · 3 free redactions/day →

Keep reading