PII Masking
Overview
The PII Masking stage detects personally identifiable information (PII) in both requests and responses, then masks, redacts, or passes it based on your configuration. This is critical for GDPR, HIPAA, CCPA, and SOC 2 compliance.
Supported Entity Types
Financial
- Credit Card Numbers: Visa, Mastercard, AmEx, Discover patterns
- Bank Account Numbers: US, EU, UK formats
- Routing Numbers: US bank routing codes
- IBAN: International Bank Account Numbers
- Swift Codes: Bank identifiers
- Cryptocurrency Addresses: Bitcoin, Ethereum, etc.
Identity
- Social Security Numbers (SSN): US format (XXX-XX-XXXX)
- Tax ID: Various countries
- Passport Numbers: Multiple formats
- Driver License: US, EU, Canada
- National ID: Country-specific formats
- Birth Dates: Full or partial
Contact
- Email Addresses: All formats
- Phone Numbers: International formats
- Home Addresses: Street addresses, postal codes
- IP Addresses: IPv4, IPv6
- URLs: Web links, domain names
Health (HIPAA)
- Medical Record Numbers: Hospital identifiers
- Health Insurance IDs: Health plan member IDs
- Biometric Data: DNA profiles, fingerprints
- Mental Health Records: Identified mental health data
- Medication Names: Linked to patient context
Corporate
- Employee IDs: Internal identifiers
- API Keys: Secret keys, tokens
- Database Credentials: Usernames, connection strings
- Private Keys: SSH keys, certificates
- OAuth Tokens: Bearer tokens, refresh tokens
Configuration
Via YAML
firewall: stages: - name: "input-pii-scanner" enabled: true config: # Which entity types to detect entity_types: - "email" - "ssn" - "credit_card" - "phone" - "api_key" - "passport"
# What to do with detected PII action: "mask" # "mask", "redact", "pass"
# Masking character (only for action: "mask") mask_char: "X" mask_percentage: 75 # Show last 25% unmasked
# Confidence threshold (0-1) # Only flag PII above this confidence confidence_threshold: 0.85
# Skip certain entity types for specific users user_exemptions: - user_id: "support_agent@company.com" skip_entity_types: ["email", "phone"]
- name: "output-pii-scanner" enabled: true config: entity_types: - "ssn" - "credit_card" - "api_key" - "database_password"
action: "mask" mask_char: "*" confidence_threshold: 0.90Via UI
- Go to Governance → Firewall → PII Masking
- Select stages to enable:
- Input PII Scanner (check requests before AI)
- Output PII Scanner (check responses after AI)
- Choose entity types to detect
- Select action (mask, redact, or pass)
- Configure masking rules
- Add exemptions and allowlists
- Click Save & Deploy
Actions Explained
Mask
Replaces characters with mask character while keeping structure recognizable.
Examples:
Original: "jane.smith@company.com"Masked: "XXXX.XXXXX@XXXXXXX.XXX"
Original: "555-123-4567"Masked: "XXX-XXX-XXXX"
Original: "4532-1234-5678-9010"Masked: "XXXX-XXXX-XXXX-9010" # Last 4 digits visibleConfig:
action: "mask"mask_char: "X"mask_percentage: 75 # Mask 75%, show last 25%Redact
Removes the PII entirely and replaces with a placeholder token.
Examples:
Original: "User jane.smith@company.com called with complaint"Redacted: "User [EMAIL] called with complaint"
Original: "Credit card 4532-1234-5678-9010 on file"Redacted: "Credit card [CREDIT_CARD] on file"Config:
action: "redact"redaction_tokens: email: "[EMAIL]" credit_card: "[CC]" ssn: "[SSN]"Pass
Detects PII but allows it through with a warning flag.
Config:
action: "pass"# Still logs for audit trail, but doesn't modify contentReal-World Examples
Example 1: Input PII Detection (HIPAA)
Request (from patient intake form):
"Patient Name: John SmithDOB: 1975-03-22Medical Record: MRN-12345678Insurance ID: INS-987654Medication: Sertraline (for depression)"Detected PII:
- birth_date: 1975-03-22 (confidence: 0.98)- medical_record_number: MRN-12345678 (confidence: 0.95)- health_insurance_id: INS-987654 (confidence: 0.92)Action (mask):
"Patient Name: John SmithDOB: XXXX-XX-XXMedical Record: XXX-XXXXXXXXInsurance ID: XXX-XXXXXXMedication: Sertraline (for depression)"Sent to AI: Masked version above is what the AI sees.
Example 2: Output PII Detection (API Key Leak)
AI Response (hallucinating a secret):
"To authenticate with our API, use:curl -H 'Authorization: Bearer sk_live_1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p'https://api.example.com/endpoint"Detected PII:
- api_key: sk_live_... (confidence: 0.99)Action (redact):
"To authenticate with our API, use:curl -H 'Authorization: Bearer [API_KEY]'https://api.example.com/endpoint"Sent to user: Redacted version, preventing secret exposure.
Example 3: Exemption (Support Agent)
Request (from support team):
Email: support@company.comCustomer Email: john.doe@gmail.comIssue: Account access problemNormal config: Would mask customer email. Exemption for support role: Support staff can see customer emails.
Action:
- Email
support@company.comis NOT masked (exempted) - Email
john.doe@gmail.comis still masked unless support has approval
Confidence Scores
The PII detector returns a confidence score (0-1) for each detection:
- 0.0-0.6: Low confidence (likely not PII, probably legitimate data)
- 0.6-0.8: Medium confidence (might be PII, recommend review)
- 0.8-0.95: High confidence (very likely PII)
- 0.95-1.0: Very high confidence (definitely PII)
Set confidence_threshold to control sensitivity:
# High sensitivity: catch everythingconfidence_threshold: 0.6 # More false positives
# Medium sensitivity: balancedconfidence_threshold: 0.8 # Good default
# Low sensitivity: very sureconfidence_threshold: 0.95 # Might miss actual PIIAllowlists & Context
Some data looks like PII but isn’t in context:
- Email addresses in security documentation
- Fake SSNs in testing (“999-99-9999”)
- Example credit cards (“4111-1111-1111-1111”)
- IPs in logs (system IPs, not user data)
Add context-based allowlists:
firewall: pii-masking: allowlist: - pattern: "^999-99-9999$" reason: "Test SSN fixture"
- pattern: "^4111-1111-1111-1111$" reason: "Example credit card"
- pattern: "example@example.com" reason: "Documentation example"
- pattern: "127.0.0.1" reason: "Localhost IP"Passthrough Rules
Allow specific users or roles to see unmasked PII:
firewall: pii-masking: passthrough: - role: "security_audit" entity_types: ["ssn", "credit_card", "api_key"] expires_at: "2025-06-30" reason: "Q2 compliance audit"
- user_id: "admin@company.com" entity_types: ["all"] # Admin sees everything expires_at: null # No expirationCompliance Mapping
| Regulation | Requirement | TruthVouch Solution |
|---|---|---|
| GDPR | Prevent PII from reaching third parties | Input PII Scanner masks before AI provider |
| HIPAA | Protect medical records | Input/Output PII Scanner detects MRN, health insurance ID |
| CCPA | Prevent sale of personal data | PII redaction prevents data leakage |
| SOC 2 | Monitor data access | Audit trail logs all PII detections |
| PCI DSS | Protect credit card data | Credit card masking + confidence thresholds |
Monitoring PII Detections
Go to Governance → Reports → PII Masking:
- Detection Volume: How much PII detected over time
- Entity Type Distribution: Which types are most common
- False Positive Rate: % of exempted/allowlisted detections
- Trend Analysis: Is PII slipping through?
Testing PII Detection
Via Web UI
- Go to Governance → Test Firewall → PII Tab
- Paste text
- Click Scan
- See detected entities and confidence scores
Via API
curl -X POST http://localhost:5000/api/v1/governance/pii/detect \ -H "Authorization: Bearer $TOKEN" \ -d '{ "text": "My email is jane.smith@company.com and my SSN is 123-45-6789", "entity_types": ["email", "ssn"] }'Response:
{ "entities": [ { "type": "email", "value": "jane.smith@company.com", "start": 19, "end": 42, "confidence": 0.98 }, { "type": "ssn", "value": "123-45-6789", "start": 56, "end": 67, "confidence": 0.96 } ], "masked_text": "My email is [EMAIL] and my SSN is [SSN]"}Troubleshooting
False Positives (Legitimate Data Flagged)
- Check confidence scores in audit logs
- If score is > 0.8, add to allowlist
- If score is < 0.8, adjust confidence_threshold higher
False Negatives (Real PII Not Detected)
- Check entity type is enabled
- Ensure confidence_threshold is not too high
- Test detection on the specific PII pattern
Performance Impact
- PII detection adds 10-30ms per request
- To optimize: disable unused entity types, increase batch size
Best Practices
- Enable for both input and output: PII can leak in both directions
- Start with high confidence threshold: Lower it gradually as you understand patterns
- Monitor exemptions: Review who has passthrough rights quarterly
- Test before deployment: Use test mode to see impact without blocking
- Audit regularly: Check reports weekly for anomalies