DLP Scanning
Overview
DLP (Data Loss Prevention) scanning stops employees from sending confidential data to unapproved AI services. It detects patterns like SSNs, passwords, API keys, credit cards, etc., and blocks them from reaching public LLMs.
Patterns Detected
Financial Data
- Credit cards (Visa, Mastercard, AmEx)
- Bank account numbers
- Routing numbers
- IBAN codes
- Bitcoin addresses
Identity Data
- Social Security Numbers (SSN)
- Passport numbers
- Driver’s licenses
- Tax IDs
Secrets
- API keys and tokens
- Database passwords
- SSH private keys
- OAuth secrets
Healthcare
- Medical record numbers
- Health insurance IDs
- Patient names (with context)
Corporate
- Confidential documents (marked)
- Trade secrets
- Customer lists
- Source code (partial)
Configuration
Via UI
- Go to Governance → Sentinel → DLP Policies
- Click Create DLP Policy
- Name: “Company DLP”
- Scope: Which users/departments
- Patterns to Block:
- ✓ Credit cards
- ✓ SSN
- ✓ API keys
- ✓ Passwords
- Custom Patterns (regex):
- Add:
(confidential|proprietary|internal use only)
- Add:
- Action: Block or warn
- Click Deploy
Via Configuration
dlp: enabled: true
patterns: # Built-in patterns credit_card: true ssn: true api_key: true password: true passport: true health_id: true
# Custom regex patterns custom: - name: "confidential_marker" pattern: '(?i)(confidential|proprietary|internal.*only)' severity: "high" action: "block"
- name: "customer_data" pattern: '^(john|jane|michael|sarah)@company.com' severity: "medium" action: "block"
# Exceptions exceptions: - pattern: "example@example.com" # Dummy data - pattern: "4111-1111-1111-1111" # Test card - user: "ai_researcher@company.com" # Researcher exempt
# Actions block_action: "block" # or "warn" block_message: "This content contains sensitive data and cannot be sent to external AI services"Actions
Block
Prevents sensitive data from being pasted/sent:
User copies password into ChatGPT input ↓DLP scans clipboard ↓Detects password pattern (confidence: 0.98) ↓Blocks paste action ↓Shows message: "Sensitive data detected. This action is blocked."Warn
Allows but logs and alerts:
User tries to paste SSN ↓DLP detects SSN ↓Shows warning: "This content contains sensitive data. Continue?" ↓User can click OK to proceed ↓Action logged, admin alertedExemptions
Allow certain data patterns:
dlp: exceptions: # Dummy data (testing) - pattern: "999-99-9999" reason: "Test SSN fixture"
- pattern: "4111-1111-1111-1111" reason: "Test credit card"
# Context-specific - pattern: "example@example.com" reason: "Documentation example"
# User exceptions - user: "finance_auditor@company.com" pattern: "*" # Allow all data for this user expires: "2025-12-31" reason: "Q4 financial audit"Custom Patterns
Write regex to detect your sensitive data:
Example 1: Internal Emails
Pattern: @company\.com$Matches: john@company.com, jane@company.comExample 2: Document Markers
Pattern: (CONFIDENTIAL|SECRET|PROPRIETARY)Matches: "This is CONFIDENTIAL information"Example 3: Employee IDs
Pattern: EMP-\d{8}Matches: EMP-12345678Example 4: Source Code Variables
Pattern: (password\s*=|api_key\s*=|secret\s*=)Matches: password="secret123", api_key="sk_live_..."Reporting
View DLP violations:
- Go to Governance → Sentinel → DLP Violations
- See:
- Pattern matched
- User who attempted
- Timestamp
- Action taken (blocked/warned)
- Tool targeted (ChatGPT, Claude, etc.)
Example Report:
Today (Mar 15):- 12 credit card patterns blocked- 3 SSN patterns blocked- 5 password patterns blocked- 0 user bypasses approved
Top Users:1. john@company.com (5 blocks)2. jane@company.com (3 blocks)3. bob@company.com (2 blocks)User Experience
When Data is Blocked
User types/pastes sensitive data into ChatGPT
System detects: "Password detected"
Message shown: "Sensitive data detected This content cannot be sent to external AI services. Pattern: api_key
Please remove sensitive data and try again."
User cannot proceed until sensitive data removedWhen Warned
User pastes SSN
System detects but warns
Message shown: "Warning: This content may contain sensitive data. Do you want to continue?"
User chooses: [Continue] [Remove Data]
If Continue: Action logged, admin alertedMonitoring & Alerts
Setup alerts for suspicious activity:
alerts: - name: "high_dlp_violations" condition: "dlp_blocks > 10 per hour" action: "email_security_team"
- name: "attempted_bypass" condition: "user approves after warning" action: "send_to_slack:#security"
- name: "pattern_evasion" condition: "repeated same pattern from same user" action: "escalate_to_admin"Performance
DLP scanning adds minimal overhead:
- Per-keystroke pattern matching: <5ms
- Clipboard monitoring: Negligible overhead
- Total agent memory: ~200MB typical
Troubleshooting
DLP Not Blocking
- Check if enabled: Settings → DLP → Enabled ✓
- Verify policy deployed: Check last sync time
- Test pattern: Try exact pattern from config
- Check exemptions: Pattern may be exempted
Too Many False Positives
- Adjust severity threshold
- Add exemptions for common data
- Refine regex patterns to be more specific
Performance Issues
- Reduce number of custom patterns
- Simplify regex patterns
- Disable pattern types not needed
- Reduce check frequency
Best Practices
- Start Strict: Block first, loosen if needed
- Regular Review: Check violation logs weekly
- Tune Exemptions: Add as you find false positives
- Test Patterns: Verify regex before deployment
- Communicate: Tell users what’s blocked and why