Skip to content

PII Masking

Overview

The PII Masking stage detects personally identifiable information (PII) in both requests and responses, then masks, redacts, or passes it based on your configuration. This is critical for GDPR, HIPAA, CCPA, and SOC 2 compliance.

Supported Entity Types

Financial

  • Credit Card Numbers: Visa, Mastercard, AmEx, Discover patterns
  • Bank Account Numbers: US, EU, UK formats
  • Routing Numbers: US bank routing codes
  • IBAN: International Bank Account Numbers
  • Swift Codes: Bank identifiers
  • Cryptocurrency Addresses: Bitcoin, Ethereum, etc.

Identity

  • Social Security Numbers (SSN): US format (XXX-XX-XXXX)
  • Tax ID: Various countries
  • Passport Numbers: Multiple formats
  • Driver License: US, EU, Canada
  • National ID: Country-specific formats
  • Birth Dates: Full or partial

Contact

  • Email Addresses: All formats
  • Phone Numbers: International formats
  • Home Addresses: Street addresses, postal codes
  • IP Addresses: IPv4, IPv6
  • URLs: Web links, domain names

Health (HIPAA)

  • Medical Record Numbers: Hospital identifiers
  • Health Insurance IDs: Health plan member IDs
  • Biometric Data: DNA profiles, fingerprints
  • Mental Health Records: Identified mental health data
  • Medication Names: Linked to patient context

Corporate

  • Employee IDs: Internal identifiers
  • API Keys: Secret keys, tokens
  • Database Credentials: Usernames, connection strings
  • Private Keys: SSH keys, certificates
  • OAuth Tokens: Bearer tokens, refresh tokens

Configuration

Via YAML

firewall:
stages:
- name: "input-pii-scanner"
enabled: true
config:
# Which entity types to detect
entity_types:
- "email"
- "ssn"
- "credit_card"
- "phone"
- "api_key"
- "passport"
# What to do with detected PII
action: "mask" # "mask", "redact", "pass"
# Masking character (only for action: "mask")
mask_char: "X"
mask_percentage: 75 # Show last 25% unmasked
# Confidence threshold (0-1)
# Only flag PII above this confidence
confidence_threshold: 0.85
# Skip certain entity types for specific users
user_exemptions:
- user_id: "support_agent@company.com"
skip_entity_types: ["email", "phone"]
- name: "output-pii-scanner"
enabled: true
config:
entity_types:
- "ssn"
- "credit_card"
- "api_key"
- "database_password"
action: "mask"
mask_char: "*"
confidence_threshold: 0.90

Via UI

  1. Go to GovernanceFirewallPII Masking
  2. Select stages to enable:
    • Input PII Scanner (check requests before AI)
    • Output PII Scanner (check responses after AI)
  3. Choose entity types to detect
  4. Select action (mask, redact, or pass)
  5. Configure masking rules
  6. Add exemptions and allowlists
  7. Click Save & Deploy

Actions Explained

Mask

Replaces characters with mask character while keeping structure recognizable.

Examples:

Original: "jane.smith@company.com"
Masked: "XXXX.XXXXX@XXXXXXX.XXX"
Original: "555-123-4567"
Masked: "XXX-XXX-XXXX"
Original: "4532-1234-5678-9010"
Masked: "XXXX-XXXX-XXXX-9010" # Last 4 digits visible

Config:

action: "mask"
mask_char: "X"
mask_percentage: 75 # Mask 75%, show last 25%

Redact

Removes the PII entirely and replaces with a placeholder token.

Examples:

Original: "User jane.smith@company.com called with complaint"
Redacted: "User [EMAIL] called with complaint"
Original: "Credit card 4532-1234-5678-9010 on file"
Redacted: "Credit card [CREDIT_CARD] on file"

Config:

action: "redact"
redaction_tokens:
email: "[EMAIL]"
credit_card: "[CC]"
ssn: "[SSN]"

Pass

Detects PII but allows it through with a warning flag.

Config:

action: "pass"
# Still logs for audit trail, but doesn't modify content

Real-World Examples

Example 1: Input PII Detection (HIPAA)

Request (from patient intake form):

"Patient Name: John Smith
DOB: 1975-03-22
Medical Record: MRN-12345678
Insurance ID: INS-987654
Medication: Sertraline (for depression)"

Detected PII:

- birth_date: 1975-03-22 (confidence: 0.98)
- medical_record_number: MRN-12345678 (confidence: 0.95)
- health_insurance_id: INS-987654 (confidence: 0.92)

Action (mask):

"Patient Name: John Smith
DOB: XXXX-XX-XX
Medical Record: XXX-XXXXXXXX
Insurance ID: XXX-XXXXXX
Medication: Sertraline (for depression)"

Sent to AI: Masked version above is what the AI sees.

Example 2: Output PII Detection (API Key Leak)

AI Response (hallucinating a secret):

"To authenticate with our API, use:
curl -H 'Authorization: Bearer sk_live_1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p'
https://api.example.com/endpoint"

Detected PII:

- api_key: sk_live_... (confidence: 0.99)

Action (redact):

"To authenticate with our API, use:
curl -H 'Authorization: Bearer [API_KEY]'
https://api.example.com/endpoint"

Sent to user: Redacted version, preventing secret exposure.

Example 3: Exemption (Support Agent)

Request (from support team):

Email: support@company.com
Customer Email: john.doe@gmail.com
Issue: Account access problem

Normal config: Would mask customer email. Exemption for support role: Support staff can see customer emails.

Action:

  • Email support@company.com is NOT masked (exempted)
  • Email john.doe@gmail.com is still masked unless support has approval

Confidence Scores

The PII detector returns a confidence score (0-1) for each detection:

  • 0.0-0.6: Low confidence (likely not PII, probably legitimate data)
  • 0.6-0.8: Medium confidence (might be PII, recommend review)
  • 0.8-0.95: High confidence (very likely PII)
  • 0.95-1.0: Very high confidence (definitely PII)

Set confidence_threshold to control sensitivity:

# High sensitivity: catch everything
confidence_threshold: 0.6 # More false positives
# Medium sensitivity: balanced
confidence_threshold: 0.8 # Good default
# Low sensitivity: very sure
confidence_threshold: 0.95 # Might miss actual PII

Allowlists & Context

Some data looks like PII but isn’t in context:

  • Email addresses in security documentation
  • Fake SSNs in testing (“999-99-9999”)
  • Example credit cards (“4111-1111-1111-1111”)
  • IPs in logs (system IPs, not user data)

Add context-based allowlists:

firewall:
pii-masking:
allowlist:
- pattern: "^999-99-9999$"
reason: "Test SSN fixture"
- pattern: "^4111-1111-1111-1111$"
reason: "Example credit card"
- pattern: "example@example.com"
reason: "Documentation example"
- pattern: "127.0.0.1"
reason: "Localhost IP"

Passthrough Rules

Allow specific users or roles to see unmasked PII:

firewall:
pii-masking:
passthrough:
- role: "security_audit"
entity_types: ["ssn", "credit_card", "api_key"]
expires_at: "2025-06-30"
reason: "Q2 compliance audit"
- user_id: "admin@company.com"
entity_types: ["all"] # Admin sees everything
expires_at: null # No expiration

Compliance Mapping

RegulationRequirementTruthVouch Solution
GDPRPrevent PII from reaching third partiesInput PII Scanner masks before AI provider
HIPAAProtect medical recordsInput/Output PII Scanner detects MRN, health insurance ID
CCPAPrevent sale of personal dataPII redaction prevents data leakage
SOC 2Monitor data accessAudit trail logs all PII detections
PCI DSSProtect credit card dataCredit card masking + confidence thresholds

Monitoring PII Detections

Go to GovernanceReportsPII Masking:

  • Detection Volume: How much PII detected over time
  • Entity Type Distribution: Which types are most common
  • False Positive Rate: % of exempted/allowlisted detections
  • Trend Analysis: Is PII slipping through?

Testing PII Detection

Via Web UI

  1. Go to GovernanceTest FirewallPII Tab
  2. Paste text
  3. Click Scan
  4. See detected entities and confidence scores

Via API

Terminal window
curl -X POST http://localhost:5000/api/v1/governance/pii/detect \
-H "Authorization: Bearer $TOKEN" \
-d '{
"text": "My email is jane.smith@company.com and my SSN is 123-45-6789",
"entity_types": ["email", "ssn"]
}'

Response:

{
"entities": [
{
"type": "email",
"value": "jane.smith@company.com",
"start": 19,
"end": 42,
"confidence": 0.98
},
{
"type": "ssn",
"value": "123-45-6789",
"start": 56,
"end": 67,
"confidence": 0.96
}
],
"masked_text": "My email is [EMAIL] and my SSN is [SSN]"
}

Troubleshooting

False Positives (Legitimate Data Flagged)

  1. Check confidence scores in audit logs
  2. If score is > 0.8, add to allowlist
  3. If score is < 0.8, adjust confidence_threshold higher

False Negatives (Real PII Not Detected)

  1. Check entity type is enabled
  2. Ensure confidence_threshold is not too high
  3. Test detection on the specific PII pattern

Performance Impact

  • PII detection adds 10-30ms per request
  • To optimize: disable unused entity types, increase batch size

Best Practices

  1. Enable for both input and output: PII can leak in both directions
  2. Start with high confidence threshold: Lower it gradually as you understand patterns
  3. Monitor exemptions: Review who has passthrough rights quarterly
  4. Test before deployment: Use test mode to see impact without blocking
  5. Audit regularly: Check reports weekly for anomalies