How the Firewall Works
Overview
The TruthVouch AI Firewall is a comprehensive request/response scanning system that automatically sits between your applications and AI providers, protecting against hallucinations, prompt injections, PII leakage, and unsafe content generation. It operates as a transparent security layer with minimal latency overhead.

Core Capabilities
Request Scanning (Input Protection)
- Prompt Injection Detection: Identifies attempts to manipulate AI behavior through crafted prompts
- PII Redaction: Masks sensitive data before it reaches the AI model
- Content Classification: Flags inappropriate requests before they’re processed
- Rate Limiting: Enforces per-user and per-system quotas
Response Scanning (Output Protection)
- Hallucination Detection: Cross-checks AI outputs against your knowledge base (“truth nuggets”)
- PII Masking: Redacts sensitive information from AI responses
- Bias Detection: Identifies potentially biased or discriminatory output
- Toxicity & Harmful Content: Flags unsafe, explicit, or policy-violating responses
Architecture
The Firewall operates as a scanning pipeline with 15 sequential stages:
- Pre-Processing: Normalize input, tokenization
- Rate Limiter: Check quota compliance
- Input PII Scanner: Detect and mask sensitive data in requests
- Injection Detector: Analyze prompt syntax for injection patterns
- Content Safety Check: Classify input safety (toxicity, harmful intent)
- Business Logic Rules: Apply custom security rules via Rego policies
- Request Passthrough: Forward to upstream AI provider (optional interception)
- Response Validator: Validate AI response structure and encoding
- Output PII Scanner: Detect and mask sensitive data in responses
- Truth Scanner: Cross-check outputs against knowledge base
- Embedding Similarity Scanner: Semantic similarity checks
- Contamination Scanner: Detect data contamination issues
- Output Content Safety: Classify response safety (bias, toxicity, harmful content)
- Policy Enforcement: Apply policy-based output modifications or blocks
- Response Delivery: Return to application with audit trail
Each stage is independently configurable and can be disabled, throttled, or customized.
Deployment Models
SaaS Proxy (Recommended)
- No Infrastructure: TruthVouch hosts the Firewall
- Setup: Add a proxy endpoint to your API calls
- Latency: Minimal (typically 50-150ms added per request)
- Scaling: Automatic, no capacity planning required
- Best for: Rapid deployment, small to medium AI systems
Example: Replace https://api.openai.com with https://firewall.truthvouch.io/openai in your SDK calls.
Self-Hosted Sidecar
- Your Infrastructure: Deploy in Docker/Kubernetes alongside your services
- Setup: Network routing, TLS certificates, policy sync
- Latency: Sub-20ms (local network, no external hops)
- Scaling: You manage pod replicas, resource allocation
- Best for: Strict data residency, high-volume systems (>10k requests/day), custom compliance needs
Request Flow Example
Client Request ↓[Firewall Inlet] → Normalize, rate limit check ↓[PII Redaction] → Mask SSN, credit cards, emails ↓[Injection Detection] → Analyze prompt syntax ↓[Content Safety] → Check toxicity score ↓[Custom Policies] → Apply Rego-based business rules ↓[AI Provider] → Send cleaned request to OpenAI/Claude/etc ↓AI Response (with PII, potential hallucinations) ↓[Output PII Masking] → Remove sensitive data from response ↓[Truth Scanner] → Cross-check against nuggets ↓[Similarity Scanner] → Semantic validation ↓[Output Safety] → Detect bias, toxicity ↓[Policy Enforcement] → Block or modify response per policy ↓Client receives response + audit eventKey Concepts
Truth Nuggets: Fragments of validated information (internal docs, validated external sources) that the Truth Scanner uses to validate AI outputs. Hallucination is detected when the AI claims something contradicting your nuggets.
Scan Stages: Modular components that examine request or response data. Each stage has configurable thresholds and can emit pass/fail/warn verdicts.
Policies (Rego): Open Policy Agent (OPA) policies written in Rego language that enforce business logic, decision rules, and complex conditions across requests and responses.
Allowlists & Blocklists: Static lists of approved or prohibited domains, email patterns, regex, or tokens that stages check against.
Audit Trail: Immutable log of every request/response pair, including which stages fired, what was masked/blocked, and why.
Why Firewall Matters
- Reduce AI Liability: Prevent hallucinations from entering customer-facing outputs
- Ensure Compliance: Automatically mask PII per GDPR, HIPAA, SOC 2 requirements
- Block Attacks: Stop prompt injection, data exfiltration, and jailbreak attempts
- Enforce Brand Safety: Prevent AI from generating biased or off-brand content
- Audit & Accountability: Full immutable record for incident response and regulatory audits
Next Steps
- Deployment: Choose SaaS Proxy or Self-Hosted
- Configuration: Set up scan stages and thresholds
- Safety Rules: Configure content safety policies
- PII Protection: Set up PII masking rules
- Injection Defense: Understand prompt injection detection
- Monitoring: Review performance tuning and audit logs