Skip to content

DLP Scanning

Overview

DLP (Data Loss Prevention) scanning stops employees from sending confidential data to unapproved AI services. It detects patterns like SSNs, passwords, API keys, credit cards, etc., and blocks them from reaching public LLMs.

Patterns Detected

Financial Data

  • Credit cards (Visa, Mastercard, AmEx)
  • Bank account numbers
  • Routing numbers
  • IBAN codes
  • Bitcoin addresses

Identity Data

  • Social Security Numbers (SSN)
  • Passport numbers
  • Driver’s licenses
  • Tax IDs

Secrets

  • API keys and tokens
  • Database passwords
  • SSH private keys
  • OAuth secrets

Healthcare

  • Medical record numbers
  • Health insurance IDs
  • Patient names (with context)

Corporate

  • Confidential documents (marked)
  • Trade secrets
  • Customer lists
  • Source code (partial)

Configuration

Via UI

  1. Go to GovernanceSentinelDLP Policies
  2. Click Create DLP Policy
  3. Name: “Company DLP”
  4. Scope: Which users/departments
  5. Patterns to Block:
    • ✓ Credit cards
    • ✓ SSN
    • ✓ API keys
    • ✓ Passwords
  6. Custom Patterns (regex):
    • Add: (confidential|proprietary|internal use only)
  7. Action: Block or warn
  8. Click Deploy

Via Configuration

dlp:
enabled: true
patterns:
# Built-in patterns
credit_card: true
ssn: true
api_key: true
password: true
passport: true
health_id: true
# Custom regex patterns
custom:
- name: "confidential_marker"
pattern: '(?i)(confidential|proprietary|internal.*only)'
severity: "high"
action: "block"
- name: "customer_data"
pattern: '^(john|jane|michael|sarah)@company.com'
severity: "medium"
action: "block"
# Exceptions
exceptions:
- pattern: "example@example.com" # Dummy data
- pattern: "4111-1111-1111-1111" # Test card
- user: "ai_researcher@company.com" # Researcher exempt
# Actions
block_action: "block" # or "warn"
block_message: "This content contains sensitive data and cannot be sent to external AI services"

Actions

Block

Prevents sensitive data from being pasted/sent:

User copies password into ChatGPT input
DLP scans clipboard
Detects password pattern (confidence: 0.98)
Blocks paste action
Shows message: "Sensitive data detected. This action is blocked."

Warn

Allows but logs and alerts:

User tries to paste SSN
DLP detects SSN
Shows warning: "This content contains sensitive data. Continue?"
User can click OK to proceed
Action logged, admin alerted

Exemptions

Allow certain data patterns:

dlp:
exceptions:
# Dummy data (testing)
- pattern: "999-99-9999"
reason: "Test SSN fixture"
- pattern: "4111-1111-1111-1111"
reason: "Test credit card"
# Context-specific
- pattern: "example@example.com"
reason: "Documentation example"
# User exceptions
- user: "finance_auditor@company.com"
pattern: "*" # Allow all data for this user
expires: "2025-12-31"
reason: "Q4 financial audit"

Custom Patterns

Write regex to detect your sensitive data:

Example 1: Internal Emails

Pattern: @company\.com$
Matches: john@company.com, jane@company.com

Example 2: Document Markers

Pattern: (CONFIDENTIAL|SECRET|PROPRIETARY)
Matches: "This is CONFIDENTIAL information"

Example 3: Employee IDs

Pattern: EMP-\d{8}
Matches: EMP-12345678

Example 4: Source Code Variables

Pattern: (password\s*=|api_key\s*=|secret\s*=)
Matches: password="secret123", api_key="sk_live_..."

Reporting

View DLP violations:

  1. Go to GovernanceSentinelDLP Violations
  2. See:
    • Pattern matched
    • User who attempted
    • Timestamp
    • Action taken (blocked/warned)
    • Tool targeted (ChatGPT, Claude, etc.)

Example Report:

Today (Mar 15):
- 12 credit card patterns blocked
- 3 SSN patterns blocked
- 5 password patterns blocked
- 0 user bypasses approved
Top Users:
1. john@company.com (5 blocks)
2. jane@company.com (3 blocks)
3. bob@company.com (2 blocks)

User Experience

When Data is Blocked

User types/pastes sensitive data into ChatGPT
System detects: "Password detected"
Message shown:
"Sensitive data detected
This content cannot be sent to external AI services.
Pattern: api_key
Please remove sensitive data and try again."
User cannot proceed until sensitive data removed

When Warned

User pastes SSN
System detects but warns
Message shown:
"Warning: This content may contain sensitive data.
Do you want to continue?"
User chooses:
[Continue] [Remove Data]
If Continue: Action logged, admin alerted

Monitoring & Alerts

Setup alerts for suspicious activity:

alerts:
- name: "high_dlp_violations"
condition: "dlp_blocks > 10 per hour"
action: "email_security_team"
- name: "attempted_bypass"
condition: "user approves after warning"
action: "send_to_slack:#security"
- name: "pattern_evasion"
condition: "repeated same pattern from same user"
action: "escalate_to_admin"

Performance

DLP scanning adds minimal overhead:

  • Per-keystroke pattern matching: <5ms
  • Clipboard monitoring: Negligible overhead
  • Total agent memory: ~200MB typical

Troubleshooting

DLP Not Blocking

  1. Check if enabled: Settings → DLP → Enabled ✓
  2. Verify policy deployed: Check last sync time
  3. Test pattern: Try exact pattern from config
  4. Check exemptions: Pattern may be exempted

Too Many False Positives

  1. Adjust severity threshold
  2. Add exemptions for common data
  3. Refine regex patterns to be more specific

Performance Issues

  1. Reduce number of custom patterns
  2. Simplify regex patterns
  3. Disable pattern types not needed
  4. Reduce check frequency

Best Practices

  1. Start Strict: Block first, loosen if needed
  2. Regular Review: Check violation logs weekly
  3. Tune Exemptions: Add as you find false positives
  4. Test Patterns: Verify regex before deployment
  5. Communicate: Tell users what’s blocked and why