Skip to content

How the Firewall Works

Overview

The TruthVouch AI Firewall is a comprehensive request/response scanning system that automatically sits between your applications and AI providers, protecting against hallucinations, prompt injections, PII leakage, and unsafe content generation. It operates as a transparent security layer with minimal latency overhead.

Truth Firewall protecting LLM integrations with automated scanning

Core Capabilities

Request Scanning (Input Protection)

  • Prompt Injection Detection: Identifies attempts to manipulate AI behavior through crafted prompts
  • PII Redaction: Masks sensitive data before it reaches the AI model
  • Content Classification: Flags inappropriate requests before they’re processed
  • Rate Limiting: Enforces per-user and per-system quotas

Response Scanning (Output Protection)

  • Hallucination Detection: Cross-checks AI outputs against your knowledge base (“truth nuggets”)
  • PII Masking: Redacts sensitive information from AI responses
  • Bias Detection: Identifies potentially biased or discriminatory output
  • Toxicity & Harmful Content: Flags unsafe, explicit, or policy-violating responses

Architecture

The Firewall operates as a scanning pipeline with 15 sequential stages:

  1. Pre-Processing: Normalize input, tokenization
  2. Rate Limiter: Check quota compliance
  3. Input PII Scanner: Detect and mask sensitive data in requests
  4. Injection Detector: Analyze prompt syntax for injection patterns
  5. Content Safety Check: Classify input safety (toxicity, harmful intent)
  6. Business Logic Rules: Apply custom security rules via Rego policies
  7. Request Passthrough: Forward to upstream AI provider (optional interception)
  8. Response Validator: Validate AI response structure and encoding
  9. Output PII Scanner: Detect and mask sensitive data in responses
  10. Truth Scanner: Cross-check outputs against knowledge base
  11. Embedding Similarity Scanner: Semantic similarity checks
  12. Contamination Scanner: Detect data contamination issues
  13. Output Content Safety: Classify response safety (bias, toxicity, harmful content)
  14. Policy Enforcement: Apply policy-based output modifications or blocks
  15. Response Delivery: Return to application with audit trail

Each stage is independently configurable and can be disabled, throttled, or customized.

Deployment Models

  • No Infrastructure: TruthVouch hosts the Firewall
  • Setup: Add a proxy endpoint to your API calls
  • Latency: Minimal (typically 50-150ms added per request)
  • Scaling: Automatic, no capacity planning required
  • Best for: Rapid deployment, small to medium AI systems

Example: Replace https://api.openai.com with https://firewall.truthvouch.io/openai in your SDK calls.

Self-Hosted Sidecar

  • Your Infrastructure: Deploy in Docker/Kubernetes alongside your services
  • Setup: Network routing, TLS certificates, policy sync
  • Latency: Sub-20ms (local network, no external hops)
  • Scaling: You manage pod replicas, resource allocation
  • Best for: Strict data residency, high-volume systems (>10k requests/day), custom compliance needs

Request Flow Example

Client Request
[Firewall Inlet] → Normalize, rate limit check
[PII Redaction] → Mask SSN, credit cards, emails
[Injection Detection] → Analyze prompt syntax
[Content Safety] → Check toxicity score
[Custom Policies] → Apply Rego-based business rules
[AI Provider] → Send cleaned request to OpenAI/Claude/etc
AI Response (with PII, potential hallucinations)
[Output PII Masking] → Remove sensitive data from response
[Truth Scanner] → Cross-check against nuggets
[Similarity Scanner] → Semantic validation
[Output Safety] → Detect bias, toxicity
[Policy Enforcement] → Block or modify response per policy
Client receives response + audit event

Key Concepts

Truth Nuggets: Fragments of validated information (internal docs, validated external sources) that the Truth Scanner uses to validate AI outputs. Hallucination is detected when the AI claims something contradicting your nuggets.

Scan Stages: Modular components that examine request or response data. Each stage has configurable thresholds and can emit pass/fail/warn verdicts.

Policies (Rego): Open Policy Agent (OPA) policies written in Rego language that enforce business logic, decision rules, and complex conditions across requests and responses.

Allowlists & Blocklists: Static lists of approved or prohibited domains, email patterns, regex, or tokens that stages check against.

Audit Trail: Immutable log of every request/response pair, including which stages fired, what was masked/blocked, and why.

Why Firewall Matters

  • Reduce AI Liability: Prevent hallucinations from entering customer-facing outputs
  • Ensure Compliance: Automatically mask PII per GDPR, HIPAA, SOC 2 requirements
  • Block Attacks: Stop prompt injection, data exfiltration, and jailbreak attempts
  • Enforce Brand Safety: Prevent AI from generating biased or off-brand content
  • Audit & Accountability: Full immutable record for incident response and regulatory audits

Next Steps

  1. Deployment: Choose SaaS Proxy or Self-Hosted
  2. Configuration: Set up scan stages and thresholds
  3. Safety Rules: Configure content safety policies
  4. PII Protection: Set up PII masking rules
  5. Injection Defense: Understand prompt injection detection
  6. Monitoring: Review performance tuning and audit logs