Testing Policies

Overview

Test every policy before deployment. TruthVouch provides a test harness to validate policies work correctly on sample data.

Quick Test

Go to Governance → Policies → [Policy] → Test

Enter test input:

{
  "type": "request",
  "user_id": "user_123",
  "model": "gpt-4",
  "tokens": 5000,
  "text": "What is AI?"
}

Click Run Test
See if policy triggers or allows

Test Cases

Create multiple test cases to cover scenarios:

Test Case 1: Should Block

Input:
  type: request
  text: "My SSN is 123-45-6789"
Expected: BLOCKED
Expected Message: Contains PII

Test Case 2: Should Allow

Input:
  type: request
  text: "What is machine learning?"
Expected: ALLOWED

Test Case 3: Edge Case

Input:
  type: request
  text: "The example SSN format is 123-45-6789"
Expected: ? (Check if false positive)

Test Input Format

Structure test input to match your policy:

{
  "type": "request",  // or "response"
  "user_id": "user_123",
  "model": "gpt-4",
  "tokens": 5000,
  "text": "...",
  "destination": "external",
  "user_type": "internal",
  "safety_flags": {
    "toxicity": 0.2,
    "bias": 0.1
  }
}

Use fields your policy actually checks.

Test Coverage

Aim for 100% coverage:

Happy Path: Normal, allowed input
Violation Path: Input that should trigger denial
Edge Cases: Boundary conditions
False Positives: Legitimate input that might wrongly trigger

Example for “Block API Keys” Policy:

Test 1: Normal text
Input: "How do I use the API?"
Expected: ALLOWED

Test 2: Exact API key
Input: "api_key=sk_live_abc123def456ghi789"
Expected: BLOCKED

Test 3: Partial key (false positive check)
Input: "Documentation: api_key parameter"
Expected: ALLOWED

Test 4: Encoded key
Input: "ak_prod_6f7c8d9e0a1b2c3d4e5f6a7b8c9d0e"
Expected: ?

Batch Testing

Test multiple cases at once:

Click + Add Test Case
Enter name, input, expected result
Repeat for all cases
Click Run All Tests
See pass/fail for each

Results:

Test Suite: API Key Protection
✓ Test 1: Normal text - PASSED
✓ Test 2: Exact API key - PASSED
✓ Test 3: Partial key - PASSED
⚠ Test 4: Encoded key - FAILED (expected BLOCKED, got ALLOWED)

Pass Rate: 75% (3/4)

If any fail, adjust policy logic and retest.

Dry-Run Mode

Deploy policy without enforcement. Logs violations without blocking.

Go to Governance → Policies → [Policy]
Click Deploy
Select Dry-Run Mode
Set duration: 24 hours, 7 days, etc.
Click Deploy to Dry-Run

What happens:

Policy evaluated on all requests
Violations logged to audit trail
User request proceeds (not blocked)
You see real data on policy impact

After dry-run:

Go to Reports → Policy Impact
See how many violations would have been blocked
Check for false positives
If good: Change to enforcement mode
If issues: Adjust policy, retest

Test with Real Data

The best test uses real requests from your system:

Go to Governance → Audit Trail
Find recent requests

Export as test data:

[
  { "user_id": "user_123", "model": "gpt-4", "text": "..." },
  { "user_id": "user_456", "model": "claude", "text": "..." }
]

Upload to policy test
Run tests against real data
Validate policy works correctly

Regression Testing

Before modifying a policy, save a baseline:

Create test suite with 10-20 representative cases
Run tests, note all pass
Modify policy
Run same tests again
If any now fail (regression), fix the issue
Once all pass, commit changes

Performance Testing

Check if policy adds excessive latency:

Go to Reports → Policy Performance

See latency added by policy:

Policy: Block PII
Avg latency: 8ms
P95 latency: 25ms
P99 latency: 50ms

If >100ms, optimize:
- Simplify regex patterns
- Cache data lookups
- Disable unnecessary checks

Testing Rego Policies

For complex Rego policies, test thoroughly:

package policies.complex_rule

deny[msg] {
    # Complex logic with multiple conditions
    user_id := input.user_id
    user_dept := data.departments[user_id]
    monthly_tokens := data.monthly_usage[user_dept]
    budget := data.dept_budgets[user_dept]

    monthly_tokens + input.tokens > budget
    msg := "Budget exceeded"
}

Test Cases:

Case 1: User under budget
  Input: dept=eng, tokens=1000, used=40000, budget=100000
  Expected: ALLOWED

Case 2: User would exceed budget
  Input: dept=eng, tokens=60000, used=40000, budget=100000
  Expected: BLOCKED

Case 3: Different department
  Input: dept=marketing, tokens=5000, used=50000, budget=100000
  Expected: ALLOWED

API Testing

Test policies via API:

curl -X POST http://localhost:5000/api/v1/governance/policies/test \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "policy_id": "policy_123",
    "test_input": {
      "type": "request",
      "user_id": "user_123",
      "text": "..."
    }
  }'

Response:

{
  "policy_id": "policy_123",
  "triggered": false,
  "message": null,
  "latency_ms": 8
}

Deployment Checklist

Before deploying a policy:

Common Testing Issues

Policy Not Triggering

Problem: Policy should fire but doesn’t.

Solution:

Check test input matches policy fields
Verify condition logic in policy
Use debug print() statements
Check data dependencies exist

False Positives

Problem: Legitimate input wrongly blocked.

Solution:

Add to allowlist
Adjust pattern/threshold
Add exceptions for specific cases
Test with more edge cases

Performance Issues

Problem: Policy adds too much latency.

Solution:

Simplify regex patterns
Cache expensive lookups
Reduce scope (don’t apply to all users)
Profile the policy

Testing Policies

Overview

Quick Test

Test Cases

Test Input Format

Test Coverage

Batch Testing

Dry-Run Mode

Test with Real Data

Regression Testing

Performance Testing

Testing Rego Policies

API Testing

Deployment Checklist

Common Testing Issues

Policy Not Triggering

False Positives

Performance Issues

Related Topics