How the Firewall Works

Overview

The TruthVouch AI Firewall is a comprehensive request/response scanning system that automatically sits between your applications and AI providers, protecting against hallucinations, prompt injections, PII leakage, and unsafe content generation. It operates as a transparent security layer with minimal latency overhead.

Truth Firewall protecting LLM integrations with automated scanning

Core Capabilities

Request Scanning (Input Protection)

Prompt Injection Detection: Identifies attempts to manipulate AI behavior through crafted prompts
PII Redaction: Masks sensitive data before it reaches the AI model
Content Classification: Flags inappropriate requests before they’re processed
Rate Limiting: Enforces per-user and per-system quotas

Response Scanning (Output Protection)

Hallucination Detection: Cross-checks AI outputs against your knowledge base (“truth nuggets”)
PII Masking: Redacts sensitive information from AI responses
Bias Detection: Identifies potentially biased or discriminatory output
Toxicity & Harmful Content: Flags unsafe, explicit, or policy-violating responses

Architecture

The Firewall operates as a scanning pipeline with 15 sequential stages:

Pre-Processing: Normalize input, tokenization
Rate Limiter: Check quota compliance
Input PII Scanner: Detect and mask sensitive data in requests
Injection Detector: Analyze prompt syntax for injection patterns
Content Safety Check: Classify input safety (toxicity, harmful intent)
Business Logic Rules: Apply custom security rules via Rego policies
Request Passthrough: Forward to upstream AI provider (optional interception)
Response Validator: Validate AI response structure and encoding
Output PII Scanner: Detect and mask sensitive data in responses
Truth Scanner: Cross-check outputs against knowledge base
Embedding Similarity Scanner: Semantic similarity checks
Contamination Scanner: Detect data contamination issues
Output Content Safety: Classify response safety (bias, toxicity, harmful content)
Policy Enforcement: Apply policy-based output modifications or blocks
Response Delivery: Return to application with audit trail

Each stage is independently configurable and can be disabled, throttled, or customized.

Deployment Models

SaaS Proxy (Recommended)

No Infrastructure: TruthVouch hosts the Firewall
Setup: Add a proxy endpoint to your API calls
Latency: Minimal (typically 50-150ms added per request)
Scaling: Automatic, no capacity planning required
Best for: Rapid deployment, small to medium AI systems

Example: Replace https://api.openai.com with https://firewall.truthvouch.io/openai in your SDK calls.

Self-Hosted Sidecar

Your Infrastructure: Deploy in Docker/Kubernetes alongside your services
Setup: Network routing, TLS certificates, policy sync
Latency: Sub-20ms (local network, no external hops)
Scaling: You manage pod replicas, resource allocation
Best for: Strict data residency, high-volume systems (>10k requests/day), custom compliance needs

Request Flow Example

Client Request
    ↓
[Firewall Inlet] → Normalize, rate limit check
    ↓
[PII Redaction] → Mask SSN, credit cards, emails
    ↓
[Injection Detection] → Analyze prompt syntax
    ↓
[Content Safety] → Check toxicity score
    ↓
[Custom Policies] → Apply Rego-based business rules
    ↓
[AI Provider] → Send cleaned request to OpenAI/Claude/etc
    ↓
AI Response (with PII, potential hallucinations)
    ↓
[Output PII Masking] → Remove sensitive data from response
    ↓
[Truth Scanner] → Cross-check against nuggets
    ↓
[Similarity Scanner] → Semantic validation
    ↓
[Output Safety] → Detect bias, toxicity
    ↓
[Policy Enforcement] → Block or modify response per policy
    ↓
Client receives response + audit event

Key Concepts

Truth Nuggets: Fragments of validated information (internal docs, validated external sources) that the Truth Scanner uses to validate AI outputs. Hallucination is detected when the AI claims something contradicting your nuggets.

Scan Stages: Modular components that examine request or response data. Each stage has configurable thresholds and can emit pass/fail/warn verdicts.

Policies (Rego): Open Policy Agent (OPA) policies written in Rego language that enforce business logic, decision rules, and complex conditions across requests and responses.

Allowlists & Blocklists: Static lists of approved or prohibited domains, email patterns, regex, or tokens that stages check against.

Audit Trail: Immutable log of every request/response pair, including which stages fired, what was masked/blocked, and why.

Why Firewall Matters

Reduce AI Liability: Prevent hallucinations from entering customer-facing outputs
Ensure Compliance: Automatically mask PII per GDPR, HIPAA, SOC 2 requirements
Block Attacks: Stop prompt injection, data exfiltration, and jailbreak attempts
Enforce Brand Safety: Prevent AI from generating biased or off-brand content
Audit & Accountability: Full immutable record for incident response and regulatory audits

Next Steps

Deployment: Choose SaaS Proxy or Self-Hosted
Configuration: Set up scan stages and thresholds
Safety Rules: Configure content safety policies
PII Protection: Set up PII masking rules
Injection Defense: Understand prompt injection detection
Monitoring: Review performance tuning and audit logs