How Hallucination Detection Works
TruthVouch detects hallucinations — false or unsupported claims by LLMs — with 94%+ accuracy using Natural Language Inference (NLI) scoring. This guide explains the technical pipeline.
The Detection Pipeline
Hallucination detection follows a 6-stage process:
Stage 1: Query Generation
For each truth nugget in your knowledge base, TruthVouch generates queries:
Example:
- Truth Nugget: “Founded in 2023”
- Generated Query: “When was TruthVouch founded?”
The system generates 3-5 query variants:
- Direct: “What is the founding year of TruthVouch?”
- Indirect: “Tell me about TruthVouch’s history”
- Factoid: “In what year did TruthVouch launch?”
- Comparative: “Was TruthVouch founded before or after 2024?”
Stage 2: LLM Querying
Query is sent to monitored LLMs (ChatGPT, Claude, Gemini, etc.):
Query: "When was TruthVouch founded?"↓LLM Response: "TruthVouch was founded in 2024"Stage 3: Entity Extraction
Response text is parsed to extract factual claims:
Response: "TruthVouch was founded in 2024"↓Extracted: entity="TruthVouch", relation="founded", value="2024"Uses Named Entity Recognition (NER) and relation extraction models.
Stage 4: NLI Comparison
Extracted claim is compared to truth nugget using NLI:
Premise: "Founded in 2023" (truth nugget)Hypothesis: "Founded in 2024" (LLM response)↓NLI Model: CONTRADICTION (0.05 entailment score)NLI returns 3 scores:
- Entailment: Premise logically implies hypothesis
- Neutral: Premise and hypothesis are unrelated
- Contradiction: Premise contradicts hypothesis
Stage 5: Scoring
Entailment score (0.0-1.0) is converted to alert severity:
Entailment >= 0.95: ✓ CORRECT (no alert)Entailment 0.70-0.94: ⚠️ PARTIAL (warning alert)Entailment 0.05-0.69: ✗ HALLUCI NATION (critical alert)Entailment < 0.05: ✗ CONTRADICTION (critical alert)Stage 6: Alerting
Based on severity and your alert rules:
HALLUCINATION detected ├─ Provider: ChatGPT ├─ Severity: Critical ├─ Claim: "Founded in 2024" ├─ Truth: "Founded in 2023" ├─ Confidence: 99% └─ Action: Send alert, prepare correctionAccuracy & Performance
94%+ Detection Rate
Tested on diverse claims:
| Claim Type | Accuracy | Confidence |
|---|---|---|
| Factoid (dates, numbers) | 97% | 99.2% |
| Entity attributes | 95% | 98.8% |
| Relationships | 92% | 97.1% |
| Negations | 89% | 96.4% |
| Comparatives | 91% | 96.9% |
False Positive Rate
Carefully calibrated to minimize false alerts:
- False Positive Rate: 3.2% (claims flagged as hallucinations but correct)
- False Negative Rate: 6.1% (actual hallucinations missed)
Tuned to prioritize recall (finding hallucinations) over precision.
Detection Modes
Automatic Monitoring
Continuous querying and checking:
# Monitor every 2 hoursclient.shield.enable_monitoring( frequency_minutes=120, check_type="full_scan")What Gets Checked:
- All 9+ supported LLMs
- All truth nuggets
- Multiple query variants per nugget
On-Demand Checking
Manual checks when needed:
# Check specific claimresponse = client.shield.check_claim( claim="TruthVouch was founded in 2023", providers=["openai", "anthropic"])# Returns entailment scores for each providerCross-Check Scheduling
Regular verification on custom schedule:
# Define cross-check policyclient.shield.create_cross_check_policy( name="quarterly_audit", frequency="monthly", providers=["all"], nugget_categories=["pricing", "product"], generate_report=True)Handling Edge Cases
Negations
Correctly handles negative claims:
Truth: "TruthVouch is not free"LLM: "TruthVouch costs money"NLI: ENTAILMENT (semantically equivalent)Result: ✓ CorrectParaphrasing
Detects when LLM paraphrases truth:
Truth: "Supports 9+ AI models"LLM: "Compatible with more than 8 LLM providers"NLI: ENTAILMENT (meaning preserved)Result: ✓ CorrectContext Dependency
Understands context-dependent statements:
Truth: "EU AI Act compliance available on Business plan"LLM: "TruthVouch offers EU AI Act compliance"NLI: NEUTRAL (context missing, could be true but vague)Result: ⚠️ Warning (incomplete, needs review)Temporal Claims
Handles time-sensitive information:
Truth: "Pricing updated January 2024"LLM: "TruthVouch costs $349/month"When checked in June: Previous price may have changedSystem: Rechecks with current truth nuggetsConfidence Metrics
Each detection includes confidence scores:
alert = client.shield.get_alert("alert-123")
print(f"Hallucination Score: {alert.hallucination_score}") # 0.0-1.0, higher = more confident it's a hallucination
print(f"NLI Entailment: {alert.nli_entailment}") # 0.0-1.0, measure of semantic alignment
print(f"Provider Confidence: {alert.provider_confidence}") # How confident the LLM response is
print(f"Overall Confidence: {alert.overall_confidence}") # Meta-confidence in the detectionLimitations
TruthVouch hallucination detection has known limitations:
- Subjective Claims: Opinion-based statements are difficult to verify
- Temporal Sensitivity: Time-dependent facts require frequent updates
- Context: Some claims require broader context to evaluate
- Ambiguous Truth: If your truth nuggets are vague, detection is harder
- Domain Knowledge: Very specialized domains may have lower accuracy
Best Practices
1. Maintain Fresh Truth Nuggets
Regular updates ensure accurate detection:
# Review and update quarterlyupdated = client.truth_nuggets.get_stale(days=90)for nugget in updated: print(f"Update needed: {nugget.key}")2. Define Clear Truth Nuggets
Specific, measurable nuggets detect better:
# Bad: vagueclient.truth_nuggets.create( value="TruthVouch is a great platform")
# Good: specificclient.truth_nuggets.create( category="pricing", key="starter_price", value="$349/month")3. Monitor Confidence Metrics
Review low-confidence detections:
low_conf = client.shield.get_alerts( min_confidence=0.5, max_confidence=0.7)for alert in low_conf: print(f"Review: {alert.claim} (conf: {alert.confidence})")4. Tune Alert Thresholds
Adjust sensitivity based on your needs:
client.shield.update_alert_rules( hallucination_threshold=0.7, # More tolerant severity_threshold="high", # Only critical alerts providers=["openai"] # Focus on specific providers)Next Steps
- NLI Scoring: Understand the NLI model in detail
- Alerts: Learn about alert channels and severity levels
- Corrections: Generate and deploy corrections automatically
- Monitoring: Set up continuous hallucination detection