How to remove personal information from text
The fastest way to remove personal information from text: paste it into PrivacyFilter.run, click Redact, copy the clean output. Free, no account, under 2 seconds. For automated pipelines, call POST https://privacyfilter.run/api/redact with your license key. Detects names, emails, phone numbers, addresses, SSNs, credit cards, IP addresses, and more.
Personal information (or PII — Personally Identifiable Information) is any data that can be used to identify an individual: their name, email address, phone number, home address, Social Security number, date of birth, and more. Removing it from text before sharing, storing, or processing it is a core requirement of privacy regulations like GDPR, CCPA, and HIPAA.
This guide covers the three main approaches: the free online tool (paste and go), the Python API (automated pipelines), and manual regex (when you need full control).
What counts as personal information in text?
Note: generic role mentions ("the doctor", "the manager"), company names, and public figures in their public capacity are not personal information and should not be redacted.
Method 1: Free online tool (fastest, no code)
If you have a single document, email, support ticket, or any text you want to clean up, the free online tool at PrivacyFilter.run is the fastest option.
No account or signup required. Works in any browser.
Copy your text into the input area. Up to 2,000 characters on the free tier, 10,000 characters on paid plans.
Replace — substitutes each entity with a labeled placeholder like [PERSON_1] or [EMAIL_2]. Best for most use cases. Mask — replaces with ████ blocks. Tag — wraps entities with <PII> tags for post-processing.
The AI (powered by OpenAI Privacy Filter) scans the text in under 2 seconds and highlights every piece of personal information it finds.
Click Copy and paste the redacted output wherever you need it. The entity table shows exactly what was found and replaced.
Example
Input:
Hi, I'm Sarah Johnson (sarah.johnson@acme.com, +1 555-842-1234).
My address is 14 Elm Street, Boston MA 02118 and my SSN is 543-21-6789.
Output (Replace mode):
Hi, I'm [PERSON_1] ([EMAIL_2], [PHONE_3]).
My address is [ADDRESS_4] and my SSN is [SSN_5].
All five pieces of personal information detected and replaced in one click. The original text was never stored on any server.
Method 2: Python API (automated pipelines)
For removing personal information at scale — thousands of support tickets, a log export, a CSV of customer data — the PrivacyFilter REST API handles the same detection logic with a single HTTP call:
import httpx
LICENSE_KEY = "your-license-key" # get at privacyfilter.run
def remove_pii(text: str) -> str:
response = httpx.post(
"https://privacyfilter.run/api/redact",
json={
"text": text,
"license_key": LICENSE_KEY,
"mode": "replace" # or "mask" or "tag"
},
timeout=15,
)
response.raise_for_status()
return response.json()["redacted_text"]
# Remove PII from a single string
clean_text = remove_pii("Call Jane Doe at 555-0192 to confirm order #A1042.")
print(clean_text)
# → "Call [PERSON_1] at [PHONE_2] to confirm order #A1042."
Batch processing (multiple documents)
import httpx
documents = [
{"id": "doc_1", "text": "Customer Alice Walker emailed alice@shop.io..."},
{"id": "doc_2", "text": "Invoice for Robert Chen, 22 Baker St, London..."},
# up to 20 documents per batch call
]
response = httpx.post(
"https://privacyfilter.run/api/redact/batch",
json={"documents": documents, "license_key": LICENSE_KEY, "mode": "replace"},
timeout=60,
).raise_for_status().json()
for result in response["results"]:
print(result["id"], "→", result["redacted_text"][:60])
Method 3: Manual regex (when you control the format)
Regex is the right choice only when your text has a predictable, well-known format — such as structured CSV exports where emails always appear in column 3. It fails completely for unstructured text like emails or support tickets, where "Mary from accounts" is a name that no regex can reliably catch.
import re
EMAIL_RE = re.compile(r'\b[A-Za-z0-9._%+\-]+@[A-Za-z0-9.\-]+\.[A-Za-z]{2,}\b')
PHONE_RE = re.compile(r'\+?1?\s?[\(]?\d{3}[\)]?[\s\-\.]?\d{3}[\s\-\.]?\d{4}')
SSN_RE = re.compile(r'\b\d{3}[-\s]?\d{2}[-\s]?\d{4}\b')
def regex_redact(text: str) -> str:
text = EMAIL_RE.sub('[EMAIL]', text)
text = PHONE_RE.sub('[PHONE]', text)
text = SSN_RE.sub('[SSN]', text)
return text
Regex limitation: The code above will catch alice@example.com but will completely miss "Alice from the sales team" in free-form text. For anything beyond perfectly structured data, use the AI-powered approach instead.
When you need to remove PII permanently vs. pseudonymize
There are two different techniques depending on your use case:
- Anonymization — personal data is removed or replaced with random values (e.g., ████). Irreversible. Required for GDPR "no-longer-personal-data" status.
- Pseudonymization — personal data is replaced with consistent placeholders (e.g.,
[PERSON_1]). Reversible if you keep the mapping. Still considered personal data under GDPR but reduces risk. Useful when you need to re-insert names later (e.g., personalizing an LLM response).
PrivacyFilter supports both: mode=mask for anonymization, mode=replace for pseudonymization (the entity table gives you the mapping to reverse it).
Remove PII from any text — free, instant, no account