PII Masking

Overview

The PII Masking stage detects personally identifiable information (PII) in both requests and responses, then masks, redacts, or passes it based on your configuration. This is critical for GDPR, HIPAA, CCPA, and SOC 2 compliance.

Supported Entity Types

Financial

Credit Card Numbers: Visa, Mastercard, AmEx, Discover patterns
Bank Account Numbers: US, EU, UK formats
Routing Numbers: US bank routing codes
IBAN: International Bank Account Numbers
Swift Codes: Bank identifiers
Cryptocurrency Addresses: Bitcoin, Ethereum, etc.

Identity

Social Security Numbers (SSN): US format (XXX-XX-XXXX)
Tax ID: Various countries
Passport Numbers: Multiple formats
Driver License: US, EU, Canada
National ID: Country-specific formats
Birth Dates: Full or partial

Contact

Email Addresses: All formats
Phone Numbers: International formats
Home Addresses: Street addresses, postal codes
IP Addresses: IPv4, IPv6
URLs: Web links, domain names

Health (HIPAA)

Medical Record Numbers: Hospital identifiers
Health Insurance IDs: Health plan member IDs
Biometric Data: DNA profiles, fingerprints
Mental Health Records: Identified mental health data
Medication Names: Linked to patient context

Corporate

Employee IDs: Internal identifiers
API Keys: Secret keys, tokens
Database Credentials: Usernames, connection strings
Private Keys: SSH keys, certificates
OAuth Tokens: Bearer tokens, refresh tokens

Configuration

Via YAML

firewall:
  stages:
    - name: "input-pii-scanner"
      enabled: true
      config:
        # Which entity types to detect
        entity_types:
          - "email"
          - "ssn"
          - "credit_card"
          - "phone"
          - "api_key"
          - "passport"

        # What to do with detected PII
        action: "mask"  # "mask", "redact", "pass"

        # Masking character (only for action: "mask")
        mask_char: "X"
        mask_percentage: 75  # Show last 25% unmasked

        # Confidence threshold (0-1)
        # Only flag PII above this confidence
        confidence_threshold: 0.85

        # Skip certain entity types for specific users
        user_exemptions:
          - user_id: "support_agent@company.com"
            skip_entity_types: ["email", "phone"]

    - name: "output-pii-scanner"
      enabled: true
      config:
        entity_types:
          - "ssn"
          - "credit_card"
          - "api_key"
          - "database_password"

        action: "mask"
        mask_char: "*"
        confidence_threshold: 0.90

Via UI

Go to Governance → Firewall → PII Masking
Select stages to enable:
- Input PII Scanner (check requests before AI)
- Output PII Scanner (check responses after AI)
Choose entity types to detect
Select action (mask, redact, or pass)
Configure masking rules
Add exemptions and allowlists
Click Save & Deploy

Actions Explained

Mask

Replaces characters with mask character while keeping structure recognizable.

Examples:

Original: "jane.smith@company.com"
Masked:   "XXXX.XXXXX@XXXXXXX.XXX"

Original: "555-123-4567"
Masked:   "XXX-XXX-XXXX"

Original: "4532-1234-5678-9010"
Masked:   "XXXX-XXXX-XXXX-9010"  # Last 4 digits visible

Config:

action: "mask"
mask_char: "X"
mask_percentage: 75  # Mask 75%, show last 25%

Redact

Removes the PII entirely and replaces with a placeholder token.

Examples:

Original: "User jane.smith@company.com called with complaint"
Redacted: "User [EMAIL] called with complaint"

Original: "Credit card 4532-1234-5678-9010 on file"
Redacted: "Credit card [CREDIT_CARD] on file"

Config:

action: "redact"
redaction_tokens:
  email: "[EMAIL]"
  credit_card: "[CC]"
  ssn: "[SSN]"

Pass

Detects PII but allows it through with a warning flag.

Config:

action: "pass"
# Still logs for audit trail, but doesn't modify content

Real-World Examples

Example 1: Input PII Detection (HIPAA)

Request (from patient intake form):

"Patient Name: John Smith
DOB: 1975-03-22
Medical Record: MRN-12345678
Insurance ID: INS-987654
Medication: Sertraline (for depression)"

Detected PII:

- birth_date: 1975-03-22 (confidence: 0.98)
- medical_record_number: MRN-12345678 (confidence: 0.95)
- health_insurance_id: INS-987654 (confidence: 0.92)

Action (mask):

"Patient Name: John Smith
DOB: XXXX-XX-XX
Medical Record: XXX-XXXXXXXX
Insurance ID: XXX-XXXXXX
Medication: Sertraline (for depression)"

Sent to AI: Masked version above is what the AI sees.

Example 2: Output PII Detection (API Key Leak)

AI Response (hallucinating a secret):

"To authenticate with our API, use:
curl -H 'Authorization: Bearer sk_live_1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p'
https://api.example.com/endpoint"

Detected PII:

- api_key: sk_live_... (confidence: 0.99)

Action (redact):

"To authenticate with our API, use:
curl -H 'Authorization: Bearer [API_KEY]'
https://api.example.com/endpoint"

Sent to user: Redacted version, preventing secret exposure.

Example 3: Exemption (Support Agent)

Request (from support team):

Email: support@company.com
Customer Email: john.doe@gmail.com
Issue: Account access problem

Normal config: Would mask customer email. Exemption for support role: Support staff can see customer emails.

Action:

Email support@company.com is NOT masked (exempted)
Email john.doe@gmail.com is still masked unless support has approval

Confidence Scores

The PII detector returns a confidence score (0-1) for each detection:

0.0-0.6: Low confidence (likely not PII, probably legitimate data)
0.6-0.8: Medium confidence (might be PII, recommend review)
0.8-0.95: High confidence (very likely PII)
0.95-1.0: Very high confidence (definitely PII)

Set confidence_threshold to control sensitivity:

# High sensitivity: catch everything
confidence_threshold: 0.6  # More false positives

# Medium sensitivity: balanced
confidence_threshold: 0.8  # Good default

# Low sensitivity: very sure
confidence_threshold: 0.95  # Might miss actual PII

Allowlists & Context

Some data looks like PII but isn’t in context:

Email addresses in security documentation
Fake SSNs in testing (“999-99-9999”)
Example credit cards (“4111-1111-1111-1111”)
IPs in logs (system IPs, not user data)

Add context-based allowlists:

firewall:
  pii-masking:
    allowlist:
      - pattern: "^999-99-9999$"
        reason: "Test SSN fixture"

      - pattern: "^4111-1111-1111-1111$"
        reason: "Example credit card"

      - pattern: "example@example.com"
        reason: "Documentation example"

      - pattern: "127.0.0.1"
        reason: "Localhost IP"

Passthrough Rules

Allow specific users or roles to see unmasked PII:

firewall:
  pii-masking:
    passthrough:
      - role: "security_audit"
        entity_types: ["ssn", "credit_card", "api_key"]
        expires_at: "2025-06-30"
        reason: "Q2 compliance audit"

      - user_id: "admin@company.com"
        entity_types: ["all"]  # Admin sees everything
        expires_at: null  # No expiration

Compliance Mapping

Regulation	Requirement	TruthVouch Solution
GDPR	Prevent PII from reaching third parties	Input PII Scanner masks before AI provider
HIPAA	Protect medical records	Input/Output PII Scanner detects MRN, health insurance ID
CCPA	Prevent sale of personal data	PII redaction prevents data leakage
SOC 2	Monitor data access	Audit trail logs all PII detections
PCI DSS	Protect credit card data	Credit card masking + confidence thresholds

Monitoring PII Detections

Go to Governance → Reports → PII Masking:

Detection Volume: How much PII detected over time
Entity Type Distribution: Which types are most common
False Positive Rate: % of exempted/allowlisted detections
Trend Analysis: Is PII slipping through?

Testing PII Detection

Via Web UI

Go to Governance → Test Firewall → PII Tab
Paste text
Click Scan
See detected entities and confidence scores

Via API

curl -X POST http://localhost:5000/api/v1/governance/pii/detect \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "text": "My email is jane.smith@company.com and my SSN is 123-45-6789",
    "entity_types": ["email", "ssn"]
  }'

Response:

{
  "entities": [
    {
      "type": "email",
      "value": "jane.smith@company.com",
      "start": 19,
      "end": 42,
      "confidence": 0.98
    },
    {
      "type": "ssn",
      "value": "123-45-6789",
      "start": 56,
      "end": 67,
      "confidence": 0.96
    }
  ],
  "masked_text": "My email is [EMAIL] and my SSN is [SSN]"
}

Troubleshooting

False Positives (Legitimate Data Flagged)

Check confidence scores in audit logs
If score is > 0.8, add to allowlist
If score is < 0.8, adjust confidence_threshold higher

False Negatives (Real PII Not Detected)

Check entity type is enabled
Ensure confidence_threshold is not too high
Test detection on the specific PII pattern

Performance Impact

PII detection adds 10-30ms per request
To optimize: disable unused entity types, increase batch size

Best Practices

Enable for both input and output: PII can leak in both directions
Start with high confidence threshold: Lower it gradually as you understand patterns
Monitor exemptions: Review who has passthrough rights quarterly
Test before deployment: Use test mode to see impact without blocking
Audit regularly: Check reports weekly for anomalies

PII Masking

Overview

Supported Entity Types

Financial

Identity

Contact

Health (HIPAA)

Corporate

Configuration

Via YAML

Via UI

Actions Explained

Mask

Redact

Pass

Real-World Examples

Example 1: Input PII Detection (HIPAA)

Example 2: Output PII Detection (API Key Leak)

Example 3: Exemption (Support Agent)

Confidence Scores

Allowlists & Context

Passthrough Rules

Compliance Mapping

Monitoring PII Detections

Testing PII Detection

Via Web UI

Via API

Troubleshooting

False Positives (Legitimate Data Flagged)

False Negatives (Real PII Not Detected)

Performance Impact

Best Practices

Related Topics