DLP Scanning

Overview

DLP (Data Loss Prevention) scanning stops employees from sending confidential data to unapproved AI services. It detects patterns like SSNs, passwords, API keys, credit cards, etc., and blocks them from reaching public LLMs.

Patterns Detected

Financial Data

Credit cards (Visa, Mastercard, AmEx)
Bank account numbers
Routing numbers
IBAN codes
Bitcoin addresses

Identity Data

Social Security Numbers (SSN)
Passport numbers
Driver’s licenses
Tax IDs

Secrets

API keys and tokens
Database passwords
SSH private keys
OAuth secrets

Healthcare

Medical record numbers
Health insurance IDs
Patient names (with context)

Corporate

Confidential documents (marked)
Trade secrets
Customer lists
Source code (partial)

Configuration

Via UI

Go to Governance → Sentinel → DLP Policies
Click Create DLP Policy
Name: “Company DLP”
Scope: Which users/departments
Patterns to Block:
- ✓ Credit cards
- ✓ SSN
- ✓ API keys
- ✓ Passwords
Custom Patterns (regex):
- Add: (confidential|proprietary|internal use only)
Action: Block or warn
Click Deploy

Via Configuration

dlp:
  enabled: true

  patterns:
    # Built-in patterns
    credit_card: true
    ssn: true
    api_key: true
    password: true
    passport: true
    health_id: true

    # Custom regex patterns
    custom:
      - name: "confidential_marker"
        pattern: '(?i)(confidential|proprietary|internal.*only)'
        severity: "high"
        action: "block"

      - name: "customer_data"
        pattern: '^(john|jane|michael|sarah)@company.com'
        severity: "medium"
        action: "block"

  # Exceptions
  exceptions:
    - pattern: "example@example.com"  # Dummy data
    - pattern: "4111-1111-1111-1111"  # Test card
    - user: "ai_researcher@company.com"  # Researcher exempt

  # Actions
  block_action: "block"  # or "warn"
  block_message: "This content contains sensitive data and cannot be sent to external AI services"

Actions

Block

Prevents sensitive data from being pasted/sent:

User copies password into ChatGPT input
  ↓
DLP scans clipboard
  ↓
Detects password pattern (confidence: 0.98)
  ↓
Blocks paste action
  ↓
Shows message: "Sensitive data detected. This action is blocked."

Warn

Allows but logs and alerts:

User tries to paste SSN
  ↓
DLP detects SSN
  ↓
Shows warning: "This content contains sensitive data. Continue?"
  ↓
User can click OK to proceed
  ↓
Action logged, admin alerted

Exemptions

Allow certain data patterns:

dlp:
  exceptions:
    # Dummy data (testing)
    - pattern: "999-99-9999"
      reason: "Test SSN fixture"

    - pattern: "4111-1111-1111-1111"
      reason: "Test credit card"

    # Context-specific
    - pattern: "example@example.com"
      reason: "Documentation example"

    # User exceptions
    - user: "finance_auditor@company.com"
      pattern: "*"  # Allow all data for this user
      expires: "2025-12-31"
      reason: "Q4 financial audit"

Custom Patterns

Write regex to detect your sensitive data:

Example 1: Internal Emails

Pattern: @company\.com$
Matches: john@company.com, jane@company.com

Example 2: Document Markers

Pattern: (CONFIDENTIAL|SECRET|PROPRIETARY)
Matches: "This is CONFIDENTIAL information"

Example 3: Employee IDs

Pattern: EMP-\d{8}
Matches: EMP-12345678

Example 4: Source Code Variables

Pattern: (password\s*=|api_key\s*=|secret\s*=)
Matches: password="secret123", api_key="sk_live_..."

Reporting

View DLP violations:

Go to Governance → Sentinel → DLP Violations
See:
- Pattern matched
- User who attempted
- Timestamp
- Action taken (blocked/warned)
- Tool targeted (ChatGPT, Claude, etc.)

Example Report:

Today (Mar 15):
- 12 credit card patterns blocked
- 3 SSN patterns blocked
- 5 password patterns blocked
- 0 user bypasses approved

Top Users:
1. john@company.com (5 blocks)
2. jane@company.com (3 blocks)
3. bob@company.com (2 blocks)

User Experience

When Data is Blocked

User types/pastes sensitive data into ChatGPT

System detects: "Password detected"

Message shown:
  "Sensitive data detected
   This content cannot be sent to external AI services.
   Pattern: api_key

   Please remove sensitive data and try again."

User cannot proceed until sensitive data removed

When Warned

User pastes SSN

System detects but warns

Message shown:
  "Warning: This content may contain sensitive data.
   Do you want to continue?"

User chooses:
  [Continue] [Remove Data]

If Continue: Action logged, admin alerted

Monitoring & Alerts

Setup alerts for suspicious activity:

alerts:
  - name: "high_dlp_violations"
    condition: "dlp_blocks > 10 per hour"
    action: "email_security_team"

  - name: "attempted_bypass"
    condition: "user approves after warning"
    action: "send_to_slack:#security"

  - name: "pattern_evasion"
    condition: "repeated same pattern from same user"
    action: "escalate_to_admin"

Performance

DLP scanning adds minimal overhead:

Per-keystroke pattern matching: <5ms
Clipboard monitoring: Negligible overhead
Total agent memory: ~200MB typical

Troubleshooting

DLP Not Blocking

Check if enabled: Settings → DLP → Enabled ✓
Verify policy deployed: Check last sync time
Test pattern: Try exact pattern from config
Check exemptions: Pattern may be exempted

Too Many False Positives

Adjust severity threshold
Add exemptions for common data
Refine regex patterns to be more specific

Performance Issues

Reduce number of custom patterns
Simplify regex patterns
Disable pattern types not needed
Reduce check frequency

Best Practices

Start Strict: Block first, loosen if needed
Regular Review: Check violation logs weekly
Tune Exemptions: Add as you find false positives
Test Patterns: Verify regex before deployment
Communicate: Tell users what’s blocked and why