Skip to content

NLI Scoring Deep Dive

Natural Language Inference (NLI) is the core technology behind hallucination detection. NLI measures semantic relationships between text segments with 94%+ accuracy. This guide explains how it works and how to interpret scores.

What is NLI?

NLI (also called Textual Entailment) answers: “Does Sentence A logically imply Sentence B?”

Three possible relationships:

Entailment (Implication)

Premise logically implies hypothesis:

Premise: "TruthVouch was founded in 2023"
Hypothesis: "TruthVouch was founded more than one year ago"
Score: ENTAILMENT (1.0)

Neutral (Unrelated)

Premise and hypothesis have no logical connection:

Premise: "TruthVouch monitors AI hallucinations"
Hypothesis: "The sky is blue"
Score: NEUTRAL (0.5)

Contradiction (Negation)

Premise contradicts hypothesis:

Premise: "TruthVouch is a SaaS platform"
Hypothesis: "TruthVouch is installed on-premises only"
Score: CONTRADICTION (0.05)

Scoring Scale

NLI scores range from 0.0 to 1.0:

1.0 ─────────────────────────────────── ENTAILMENT
0.7 ─ Semantic alignment, high confidence
0.5 ─ NEUTRAL / Ambiguous relationship
0.3 ─ Likely contradiction, low alignment
0.0 ─────────────────────────────────── CONTRADICTION

Interpretation

ScoreInterpretationAction
0.95-1.0Definite entailmentCorrect
0.85-0.94Strong alignmentLikely correct
0.70-0.84Moderate alignmentReview needed
0.50-0.69Weak/neutralUnclear relationship
0.30-0.49Likely contradictionProbable hallucination
0.0-0.29Definite contradictionDefinite hallucination

How NLI Works

Semantic Encoding

Both premise and hypothesis are converted to semantic vectors:

Text Input
Tokenization (break into words)
Embedding (convert to semantic vectors)
Contextual Encoding (bidirectional transformer)
Semantic Vector (384 dimensions)

Example:

  • “TruthVouch was founded in 2023” → [0.234, -0.891, 0.123, …]
  • “Founded in 2024” → [0.221, -0.876, 0.098, …]

Relation Classification

A neural network classifies relationship between vectors:

Premise Vector: [0.234, -0.891, ...]
Hypothesis Vector: [0.221, -0.876, ...]
Concatenate & Feed to Classifier
Output: [P(entail)=0.97, P(neutral)=0.02, P(contra)=0.01]
Result: ENTAILMENT (97%)

Confidence Estimation

The model’s confidence in its classification:

High Confidence: Model is certain about relationship
(e.g., very similar vectors clearly indicate entailment)
Low Confidence: Model is uncertain
(e.g., ambiguous text, missing context)

Model Characteristics

TruthVouch NLI Model

TruthVouch uses a fine-tuned RoBERTa-large model trained on:

  • MNLI (Multi-Genre Natural Language Inference, 433K examples)
  • Custom hallucination detection data (50K+ real-world examples)
  • Domain-specific fine-tuning for business/technical text

Strengths

  • High Accuracy: 94%+ on test set
  • Fast: <10ms per comparison
  • Domain-Optimized: Fine-tuned for AI/business claims
  • Context-Aware: Bidirectional attention captures context
  • Robust: Handles paraphrasing, negation, temporal claims

Limitations

  • Ambiguous Input: Unclear premise/hypothesis reduces accuracy
  • Subjective Claims: Opinion-based statements difficult to classify
  • Implicit Context: Requires explicit information
  • Domain Drift: Performs best on business/technical text
  • Entailment Chains: Doesn’t follow multi-step logical chains

Alert Thresholds

Convert NLI scores to alert severity:

Default Configuration

client.shield.update_nli_thresholds(
contradictionThreshold=0.4, # < 0.4 = contradiction
neutralThreshold=0.6, # 0.4-0.6 = neutral
entailmentThreshold=0.85, # > 0.85 = entailment
)

Alerts trigger when scores fall outside ranges:

Score < 0.4: CRITICAL ─ Hallucination detected
0.4-0.6: WARNING ─ Unclear relationship, may need review
Score > 0.85: OK ─ Content verified

Customizing Thresholds

Adjust sensitivity based on risk tolerance:

Strict Mode (low false negatives, high false positives):

client.shield.update_nli_thresholds(
contradictionThreshold=0.5,
neutralThreshold=0.7,
entailmentThreshold=0.95
)
# Only accept very high confidence matches

Permissive Mode (high false negatives, low false positives):

client.shield.update_nli_thresholds(
contradictionThreshold=0.2,
neutralThreshold=0.5,
entailmentThreshold=0.75
)
# Allow some ambiguity, fewer alerts

Common Score Patterns

Exact Match

Identical or nearly identical texts:

Premise: "TruthVouch costs $349/month"
Hypothesis: "TruthVouch costs $349/month"
Score: 0.99 (ENTAILMENT)

Paraphrase

Different wording, same meaning:

Premise: "TruthVouch was founded in 2023"
Hypothesis: "TruthVouch's founding year is 2023"
Score: 0.94 (ENTAILMENT)

Partial Match

Subset relationship:

Premise: "TruthVouch monitors 9+ LLM models"
Hypothesis: "TruthVouch monitors ChatGPT"
Score: 0.78 (Weak entailment, not guaranteed)

Negation

Opposite meaning:

Premise: "TruthVouch is cloud-based"
Hypothesis: "TruthVouch is on-premises"
Score: 0.08 (CONTRADICTION)

Temporal Shift

Different time period:

Premise: "Founded in 2023"
Hypothesis: "Founded in 2024"
Score: 0.12 (CONTRADICTION)

Unrelated

No semantic connection:

Premise: "TruthVouch monitors AI"
Hypothesis: "The Earth orbits the Sun"
Score: 0.51 (NEUTRAL)

Improving NLI Accuracy

1. Precise Truth Nuggets

Clear, specific nuggets improve matching:

# Poor: too vague
client.truth_nuggets.create(value="Good product")
# Good: specific and measurable
client.truth_nuggets.create(
category="pricing",
key="starter_tier",
value="Starter plan: $349/month, includes basic features"
)

2. Query Variant Coverage

Multiple query types catch more hallucinations:

# System generates queries:
# - "What is TruthVouch's founding year?"
# - "When was TruthVouch founded?"
# - "TruthVouch founding: [year]?"
# - "Is TruthVouch older than 2020?"

3. Truth Nugget Versioning

Track updates and their context:

client.truth_nuggets.update(
nugget_id="pricing_starter",
value="$399/month", # Changed from $349
version="2.0",
effective_date="2024-01-01",
reason="Annual price increase"
)

4. Monitor Confidence

Review low-confidence detections:

low_conf = client.shield.get_alerts(
min_confidence=0.5,
max_confidence=0.7
)
for alert in low_conf:
# Manually review and adjust thresholds if needed
print(f"Review: {alert.claim}")

Performance Metrics

Speed

  • Per Comparison: <10ms
  • Batch Processing: 1000 comparisons in 8 seconds
  • Streaming: Sub-100ms latency for real-time checks

Throughput

  • API: 1000+ requests/second
  • Batch: 10,000+ comparisons per minute
  • Monitoring: Continuous checks on all models without lag

Cost

NLI scoring is included in all TruthVouch tiers (no additional cost).

Next Steps

  • Hallucination Detection: Learn the full detection pipeline
  • Alert Configuration: Customize alert thresholds for your needs
  • Truth Nuggets: Create precise nuggets for better detection
  • Monitoring: Set up continuous checks