Skip to content

How Hallucination Detection Works

TruthVouch detects hallucinations — false or unsupported claims by LLMs — with 94%+ accuracy using Natural Language Inference (NLI) scoring. This guide explains the technical pipeline.

The Detection Pipeline

Hallucination detection follows a 6-stage process:

Stage 1: Query Generation

For each truth nugget in your knowledge base, TruthVouch generates queries:

Example:

  • Truth Nugget: “Founded in 2023”
  • Generated Query: “When was TruthVouch founded?”

The system generates 3-5 query variants:

  • Direct: “What is the founding year of TruthVouch?”
  • Indirect: “Tell me about TruthVouch’s history”
  • Factoid: “In what year did TruthVouch launch?”
  • Comparative: “Was TruthVouch founded before or after 2024?”

Stage 2: LLM Querying

Query is sent to monitored LLMs (ChatGPT, Claude, Gemini, etc.):

Query: "When was TruthVouch founded?"
LLM Response: "TruthVouch was founded in 2024"

Stage 3: Entity Extraction

Response text is parsed to extract factual claims:

Response: "TruthVouch was founded in 2024"
Extracted: entity="TruthVouch", relation="founded", value="2024"

Uses Named Entity Recognition (NER) and relation extraction models.

Stage 4: NLI Comparison

Extracted claim is compared to truth nugget using NLI:

Premise: "Founded in 2023" (truth nugget)
Hypothesis: "Founded in 2024" (LLM response)
NLI Model: CONTRADICTION (0.05 entailment score)

NLI returns 3 scores:

  • Entailment: Premise logically implies hypothesis
  • Neutral: Premise and hypothesis are unrelated
  • Contradiction: Premise contradicts hypothesis

Stage 5: Scoring

Entailment score (0.0-1.0) is converted to alert severity:

Entailment >= 0.95: ✓ CORRECT (no alert)
Entailment 0.70-0.94: ⚠️ PARTIAL (warning alert)
Entailment 0.05-0.69: ✗ HALLUCI NATION (critical alert)
Entailment < 0.05: ✗ CONTRADICTION (critical alert)

Stage 6: Alerting

Based on severity and your alert rules:

HALLUCINATION detected
├─ Provider: ChatGPT
├─ Severity: Critical
├─ Claim: "Founded in 2024"
├─ Truth: "Founded in 2023"
├─ Confidence: 99%
└─ Action: Send alert, prepare correction

Accuracy & Performance

94%+ Detection Rate

Tested on diverse claims:

Claim TypeAccuracyConfidence
Factoid (dates, numbers)97%99.2%
Entity attributes95%98.8%
Relationships92%97.1%
Negations89%96.4%
Comparatives91%96.9%

False Positive Rate

Carefully calibrated to minimize false alerts:

  • False Positive Rate: 3.2% (claims flagged as hallucinations but correct)
  • False Negative Rate: 6.1% (actual hallucinations missed)

Tuned to prioritize recall (finding hallucinations) over precision.

Detection Modes

Automatic Monitoring

Continuous querying and checking:

# Monitor every 2 hours
client.shield.enable_monitoring(
frequency_minutes=120,
check_type="full_scan"
)

What Gets Checked:

  • All 9+ supported LLMs
  • All truth nuggets
  • Multiple query variants per nugget

On-Demand Checking

Manual checks when needed:

# Check specific claim
response = client.shield.check_claim(
claim="TruthVouch was founded in 2023",
providers=["openai", "anthropic"]
)
# Returns entailment scores for each provider

Cross-Check Scheduling

Regular verification on custom schedule:

# Define cross-check policy
client.shield.create_cross_check_policy(
name="quarterly_audit",
frequency="monthly",
providers=["all"],
nugget_categories=["pricing", "product"],
generate_report=True
)

Handling Edge Cases

Negations

Correctly handles negative claims:

Truth: "TruthVouch is not free"
LLM: "TruthVouch costs money"
NLI: ENTAILMENT (semantically equivalent)
Result: ✓ Correct

Paraphrasing

Detects when LLM paraphrases truth:

Truth: "Supports 9+ AI models"
LLM: "Compatible with more than 8 LLM providers"
NLI: ENTAILMENT (meaning preserved)
Result: ✓ Correct

Context Dependency

Understands context-dependent statements:

Truth: "EU AI Act compliance available on Business plan"
LLM: "TruthVouch offers EU AI Act compliance"
NLI: NEUTRAL (context missing, could be true but vague)
Result: ⚠️ Warning (incomplete, needs review)

Temporal Claims

Handles time-sensitive information:

Truth: "Pricing updated January 2024"
LLM: "TruthVouch costs $349/month"
When checked in June: Previous price may have changed
System: Rechecks with current truth nuggets

Confidence Metrics

Each detection includes confidence scores:

alert = client.shield.get_alert("alert-123")
print(f"Hallucination Score: {alert.hallucination_score}")
# 0.0-1.0, higher = more confident it's a hallucination
print(f"NLI Entailment: {alert.nli_entailment}")
# 0.0-1.0, measure of semantic alignment
print(f"Provider Confidence: {alert.provider_confidence}")
# How confident the LLM response is
print(f"Overall Confidence: {alert.overall_confidence}")
# Meta-confidence in the detection

Limitations

TruthVouch hallucination detection has known limitations:

  1. Subjective Claims: Opinion-based statements are difficult to verify
  2. Temporal Sensitivity: Time-dependent facts require frequent updates
  3. Context: Some claims require broader context to evaluate
  4. Ambiguous Truth: If your truth nuggets are vague, detection is harder
  5. Domain Knowledge: Very specialized domains may have lower accuracy

Best Practices

1. Maintain Fresh Truth Nuggets

Regular updates ensure accurate detection:

# Review and update quarterly
updated = client.truth_nuggets.get_stale(days=90)
for nugget in updated:
print(f"Update needed: {nugget.key}")

2. Define Clear Truth Nuggets

Specific, measurable nuggets detect better:

# Bad: vague
client.truth_nuggets.create(
value="TruthVouch is a great platform"
)
# Good: specific
client.truth_nuggets.create(
category="pricing",
key="starter_price",
value="$349/month"
)

3. Monitor Confidence Metrics

Review low-confidence detections:

low_conf = client.shield.get_alerts(
min_confidence=0.5,
max_confidence=0.7
)
for alert in low_conf:
print(f"Review: {alert.claim} (conf: {alert.confidence})")

4. Tune Alert Thresholds

Adjust sensitivity based on your needs:

client.shield.update_alert_rules(
hallucination_threshold=0.7, # More tolerant
severity_threshold="high", # Only critical alerts
providers=["openai"] # Focus on specific providers
)

Next Steps

  • NLI Scoring: Understand the NLI model in detail
  • Alerts: Learn about alert channels and severity levels
  • Corrections: Generate and deploy corrections automatically
  • Monitoring: Set up continuous hallucination detection