NLI Scoring Deep Dive
Natural Language Inference (NLI) is the core technology behind hallucination detection. NLI measures semantic relationships between text segments with 94%+ accuracy. This guide explains how it works and how to interpret scores.
What is NLI?
NLI (also called Textual Entailment) answers: “Does Sentence A logically imply Sentence B?”
Three possible relationships:
Entailment (Implication)
Premise logically implies hypothesis:
Premise: "TruthVouch was founded in 2023"Hypothesis: "TruthVouch was founded more than one year ago"Score: ENTAILMENT (1.0)Neutral (Unrelated)
Premise and hypothesis have no logical connection:
Premise: "TruthVouch monitors AI hallucinations"Hypothesis: "The sky is blue"Score: NEUTRAL (0.5)Contradiction (Negation)
Premise contradicts hypothesis:
Premise: "TruthVouch is a SaaS platform"Hypothesis: "TruthVouch is installed on-premises only"Score: CONTRADICTION (0.05)Scoring Scale
NLI scores range from 0.0 to 1.0:
1.0 ─────────────────────────────────── ENTAILMENT0.7 ─ Semantic alignment, high confidence0.5 ─ NEUTRAL / Ambiguous relationship0.3 ─ Likely contradiction, low alignment0.0 ─────────────────────────────────── CONTRADICTIONInterpretation
| Score | Interpretation | Action |
|---|---|---|
| 0.95-1.0 | Definite entailment | Correct |
| 0.85-0.94 | Strong alignment | Likely correct |
| 0.70-0.84 | Moderate alignment | Review needed |
| 0.50-0.69 | Weak/neutral | Unclear relationship |
| 0.30-0.49 | Likely contradiction | Probable hallucination |
| 0.0-0.29 | Definite contradiction | Definite hallucination |
How NLI Works
Semantic Encoding
Both premise and hypothesis are converted to semantic vectors:
Text Input ↓Tokenization (break into words) ↓Embedding (convert to semantic vectors) ↓Contextual Encoding (bidirectional transformer) ↓Semantic Vector (384 dimensions)Example:
- “TruthVouch was founded in 2023” → [0.234, -0.891, 0.123, …]
- “Founded in 2024” → [0.221, -0.876, 0.098, …]
Relation Classification
A neural network classifies relationship between vectors:
Premise Vector: [0.234, -0.891, ...]Hypothesis Vector: [0.221, -0.876, ...] ↓Concatenate & Feed to Classifier ↓Output: [P(entail)=0.97, P(neutral)=0.02, P(contra)=0.01] ↓Result: ENTAILMENT (97%)Confidence Estimation
The model’s confidence in its classification:
High Confidence: Model is certain about relationship (e.g., very similar vectors clearly indicate entailment)
Low Confidence: Model is uncertain (e.g., ambiguous text, missing context)Model Characteristics
TruthVouch NLI Model
TruthVouch uses a fine-tuned RoBERTa-large model trained on:
- MNLI (Multi-Genre Natural Language Inference, 433K examples)
- Custom hallucination detection data (50K+ real-world examples)
- Domain-specific fine-tuning for business/technical text
Strengths
- High Accuracy: 94%+ on test set
- Fast: <10ms per comparison
- Domain-Optimized: Fine-tuned for AI/business claims
- Context-Aware: Bidirectional attention captures context
- Robust: Handles paraphrasing, negation, temporal claims
Limitations
- Ambiguous Input: Unclear premise/hypothesis reduces accuracy
- Subjective Claims: Opinion-based statements difficult to classify
- Implicit Context: Requires explicit information
- Domain Drift: Performs best on business/technical text
- Entailment Chains: Doesn’t follow multi-step logical chains
Alert Thresholds
Convert NLI scores to alert severity:
Default Configuration
client.shield.update_nli_thresholds( contradictionThreshold=0.4, # < 0.4 = contradiction neutralThreshold=0.6, # 0.4-0.6 = neutral entailmentThreshold=0.85, # > 0.85 = entailment)Alerts trigger when scores fall outside ranges:
Score < 0.4: CRITICAL ─ Hallucination detected0.4-0.6: WARNING ─ Unclear relationship, may need reviewScore > 0.85: OK ─ Content verifiedCustomizing Thresholds
Adjust sensitivity based on risk tolerance:
Strict Mode (low false negatives, high false positives):
client.shield.update_nli_thresholds( contradictionThreshold=0.5, neutralThreshold=0.7, entailmentThreshold=0.95)# Only accept very high confidence matchesPermissive Mode (high false negatives, low false positives):
client.shield.update_nli_thresholds( contradictionThreshold=0.2, neutralThreshold=0.5, entailmentThreshold=0.75)# Allow some ambiguity, fewer alertsCommon Score Patterns
Exact Match
Identical or nearly identical texts:
Premise: "TruthVouch costs $349/month"Hypothesis: "TruthVouch costs $349/month"Score: 0.99 (ENTAILMENT)Paraphrase
Different wording, same meaning:
Premise: "TruthVouch was founded in 2023"Hypothesis: "TruthVouch's founding year is 2023"Score: 0.94 (ENTAILMENT)Partial Match
Subset relationship:
Premise: "TruthVouch monitors 9+ LLM models"Hypothesis: "TruthVouch monitors ChatGPT"Score: 0.78 (Weak entailment, not guaranteed)Negation
Opposite meaning:
Premise: "TruthVouch is cloud-based"Hypothesis: "TruthVouch is on-premises"Score: 0.08 (CONTRADICTION)Temporal Shift
Different time period:
Premise: "Founded in 2023"Hypothesis: "Founded in 2024"Score: 0.12 (CONTRADICTION)Unrelated
No semantic connection:
Premise: "TruthVouch monitors AI"Hypothesis: "The Earth orbits the Sun"Score: 0.51 (NEUTRAL)Improving NLI Accuracy
1. Precise Truth Nuggets
Clear, specific nuggets improve matching:
# Poor: too vagueclient.truth_nuggets.create(value="Good product")
# Good: specific and measurableclient.truth_nuggets.create( category="pricing", key="starter_tier", value="Starter plan: $349/month, includes basic features")2. Query Variant Coverage
Multiple query types catch more hallucinations:
# System generates queries:# - "What is TruthVouch's founding year?"# - "When was TruthVouch founded?"# - "TruthVouch founding: [year]?"# - "Is TruthVouch older than 2020?"3. Truth Nugget Versioning
Track updates and their context:
client.truth_nuggets.update( nugget_id="pricing_starter", value="$399/month", # Changed from $349 version="2.0", effective_date="2024-01-01", reason="Annual price increase")4. Monitor Confidence
Review low-confidence detections:
low_conf = client.shield.get_alerts( min_confidence=0.5, max_confidence=0.7)
for alert in low_conf: # Manually review and adjust thresholds if needed print(f"Review: {alert.claim}")Performance Metrics
Speed
- Per Comparison: <10ms
- Batch Processing: 1000 comparisons in 8 seconds
- Streaming: Sub-100ms latency for real-time checks
Throughput
- API: 1000+ requests/second
- Batch: 10,000+ comparisons per minute
- Monitoring: Continuous checks on all models without lag
Cost
NLI scoring is included in all TruthVouch tiers (no additional cost).
Next Steps
- Hallucination Detection: Learn the full detection pipeline
- Alert Configuration: Customize alert thresholds for your needs
- Truth Nuggets: Create precise nuggets for better detection
- Monitoring: Set up continuous checks