How Hallucination Detection Works

TruthVouch detects hallucinations — false or unsupported claims by LLMs — with 94%+ accuracy using Natural Language Inference (NLI) scoring. This guide explains the technical pipeline.

The Detection Pipeline

Hallucination detection follows a 6-stage process:

Stage 1: Query Generation

For each truth nugget in your knowledge base, TruthVouch generates queries:

Example:

Truth Nugget: “Founded in 2023”
Generated Query: “When was TruthVouch founded?”

The system generates 3-5 query variants:

Direct: “What is the founding year of TruthVouch?”
Indirect: “Tell me about TruthVouch’s history”
Factoid: “In what year did TruthVouch launch?”
Comparative: “Was TruthVouch founded before or after 2024?”

Stage 2: LLM Querying

Query is sent to monitored LLMs (ChatGPT, Claude, Gemini, etc.):

Query: "When was TruthVouch founded?"
↓
LLM Response: "TruthVouch was founded in 2024"

Stage 3: Entity Extraction

Response text is parsed to extract factual claims:

Response: "TruthVouch was founded in 2024"
↓
Extracted: entity="TruthVouch", relation="founded", value="2024"

Uses Named Entity Recognition (NER) and relation extraction models.

Stage 4: NLI Comparison

Extracted claim is compared to truth nugget using NLI:

Premise: "Founded in 2023"  (truth nugget)
Hypothesis: "Founded in 2024"  (LLM response)
↓
NLI Model: CONTRADICTION (0.05 entailment score)

NLI returns 3 scores:

Entailment: Premise logically implies hypothesis
Neutral: Premise and hypothesis are unrelated
Contradiction: Premise contradicts hypothesis

Stage 5: Scoring

Entailment score (0.0-1.0) is converted to alert severity:

Entailment >= 0.95: ✓ CORRECT (no alert)
Entailment 0.70-0.94: ⚠️ PARTIAL (warning alert)
Entailment 0.05-0.69: ✗ HALLUCI NATION (critical alert)
Entailment < 0.05: ✗ CONTRADICTION (critical alert)

Stage 6: Alerting

Based on severity and your alert rules:

HALLUCINATION detected
  ├─ Provider: ChatGPT
  ├─ Severity: Critical
  ├─ Claim: "Founded in 2024"
  ├─ Truth: "Founded in 2023"
  ├─ Confidence: 99%
  └─ Action: Send alert, prepare correction

Accuracy & Performance

94%+ Detection Rate

Tested on diverse claims:

Claim Type	Accuracy	Confidence
Factoid (dates, numbers)	97%	99.2%
Entity attributes	95%	98.8%
Relationships	92%	97.1%
Negations	89%	96.4%
Comparatives	91%	96.9%

False Positive Rate

Carefully calibrated to minimize false alerts:

False Positive Rate: 3.2% (claims flagged as hallucinations but correct)
False Negative Rate: 6.1% (actual hallucinations missed)

Tuned to prioritize recall (finding hallucinations) over precision.

Detection Modes

Automatic Monitoring

Continuous querying and checking:

# Monitor every 2 hours
client.shield.enable_monitoring(
    frequency_minutes=120,
    check_type="full_scan"
)

What Gets Checked:

All 9+ supported LLMs
All truth nuggets
Multiple query variants per nugget

On-Demand Checking

Manual checks when needed:

# Check specific claim
response = client.shield.check_claim(
    claim="TruthVouch was founded in 2023",
    providers=["openai", "anthropic"]
)
# Returns entailment scores for each provider

Cross-Check Scheduling

Regular verification on custom schedule:

# Define cross-check policy
client.shield.create_cross_check_policy(
    name="quarterly_audit",
    frequency="monthly",
    providers=["all"],
    nugget_categories=["pricing", "product"],
    generate_report=True
)

Handling Edge Cases

Negations

Correctly handles negative claims:

Truth: "TruthVouch is not free"
LLM: "TruthVouch costs money"
NLI: ENTAILMENT (semantically equivalent)
Result: ✓ Correct

Paraphrasing

Detects when LLM paraphrases truth:

Truth: "Supports 9+ AI models"
LLM: "Compatible with more than 8 LLM providers"
NLI: ENTAILMENT (meaning preserved)
Result: ✓ Correct

Context Dependency

Understands context-dependent statements:

Truth: "EU AI Act compliance available on Business plan"
LLM: "TruthVouch offers EU AI Act compliance"
NLI: NEUTRAL (context missing, could be true but vague)
Result: ⚠️ Warning (incomplete, needs review)

Temporal Claims

Handles time-sensitive information:

Truth: "Pricing updated January 2024"
LLM: "TruthVouch costs $349/month"
When checked in June: Previous price may have changed
System: Rechecks with current truth nuggets

Confidence Metrics

Each detection includes confidence scores:

alert = client.shield.get_alert("alert-123")

print(f"Hallucination Score: {alert.hallucination_score}")
  # 0.0-1.0, higher = more confident it's a hallucination

print(f"NLI Entailment: {alert.nli_entailment}")
  # 0.0-1.0, measure of semantic alignment

print(f"Provider Confidence: {alert.provider_confidence}")
  # How confident the LLM response is

print(f"Overall Confidence: {alert.overall_confidence}")
  # Meta-confidence in the detection

Limitations

TruthVouch hallucination detection has known limitations:

Subjective Claims: Opinion-based statements are difficult to verify
Temporal Sensitivity: Time-dependent facts require frequent updates
Context: Some claims require broader context to evaluate
Ambiguous Truth: If your truth nuggets are vague, detection is harder
Domain Knowledge: Very specialized domains may have lower accuracy

Best Practices

1. Maintain Fresh Truth Nuggets

Regular updates ensure accurate detection:

# Review and update quarterly
updated = client.truth_nuggets.get_stale(days=90)
for nugget in updated:
    print(f"Update needed: {nugget.key}")

2. Define Clear Truth Nuggets

Specific, measurable nuggets detect better:

# Bad: vague
client.truth_nuggets.create(
    value="TruthVouch is a great platform"
)

# Good: specific
client.truth_nuggets.create(
    category="pricing",
    key="starter_price",
    value="$349/month"
)

3. Monitor Confidence Metrics

Review low-confidence detections:

low_conf = client.shield.get_alerts(
    min_confidence=0.5,
    max_confidence=0.7
)
for alert in low_conf:
    print(f"Review: {alert.claim} (conf: {alert.confidence})")

4. Tune Alert Thresholds

Adjust sensitivity based on your needs:

client.shield.update_alert_rules(
    hallucination_threshold=0.7,  # More tolerant
    severity_threshold="high",     # Only critical alerts
    providers=["openai"]           # Focus on specific providers
)

Next Steps

NLI Scoring: Understand the NLI model in detail
Alerts: Learn about alert channels and severity levels
Corrections: Generate and deploy corrections automatically
Monitoring: Set up continuous hallucination detection