NLI Scoring Deep Dive

Natural Language Inference (NLI) is the core technology behind hallucination detection. NLI measures semantic relationships between text segments with 94%+ accuracy. This guide explains how it works and how to interpret scores.

What is NLI?

NLI (also called Textual Entailment) answers: “Does Sentence A logically imply Sentence B?”

Three possible relationships:

Entailment (Implication)

Premise logically implies hypothesis:

Premise: "TruthVouch was founded in 2023"
Hypothesis: "TruthVouch was founded more than one year ago"
Score: ENTAILMENT (1.0)

Neutral (Unrelated)

Premise and hypothesis have no logical connection:

Premise: "TruthVouch monitors AI hallucinations"
Hypothesis: "The sky is blue"
Score: NEUTRAL (0.5)

Contradiction (Negation)

Premise contradicts hypothesis:

Premise: "TruthVouch is a SaaS platform"
Hypothesis: "TruthVouch is installed on-premises only"
Score: CONTRADICTION (0.05)

Scoring Scale

NLI scores range from 0.0 to 1.0:

1.0 ─────────────────────────────────── ENTAILMENT
0.7 ─ Semantic alignment, high confidence
0.5 ─ NEUTRAL / Ambiguous relationship
0.3 ─ Likely contradiction, low alignment
0.0 ─────────────────────────────────── CONTRADICTION

Interpretation

Score	Interpretation	Action
0.95-1.0	Definite entailment	Correct
0.85-0.94	Strong alignment	Likely correct
0.70-0.84	Moderate alignment	Review needed
0.50-0.69	Weak/neutral	Unclear relationship
0.30-0.49	Likely contradiction	Probable hallucination
0.0-0.29	Definite contradiction	Definite hallucination

How NLI Works

Semantic Encoding

Both premise and hypothesis are converted to semantic vectors:

Text Input
  ↓
Tokenization (break into words)
  ↓
Embedding (convert to semantic vectors)
  ↓
Contextual Encoding (bidirectional transformer)
  ↓
Semantic Vector (384 dimensions)

Example:

“TruthVouch was founded in 2023” → [0.234, -0.891, 0.123, …]
“Founded in 2024” → [0.221, -0.876, 0.098, …]

Relation Classification

A neural network classifies relationship between vectors:

Premise Vector: [0.234, -0.891, ...]
Hypothesis Vector: [0.221, -0.876, ...]
  ↓
Concatenate & Feed to Classifier
  ↓
Output: [P(entail)=0.97, P(neutral)=0.02, P(contra)=0.01]
  ↓
Result: ENTAILMENT (97%)

Confidence Estimation

The model’s confidence in its classification:

High Confidence: Model is certain about relationship
  (e.g., very similar vectors clearly indicate entailment)

Low Confidence: Model is uncertain
  (e.g., ambiguous text, missing context)

Model Characteristics

TruthVouch NLI Model

TruthVouch uses a fine-tuned RoBERTa-large model trained on:

MNLI (Multi-Genre Natural Language Inference, 433K examples)
Custom hallucination detection data (50K+ real-world examples)
Domain-specific fine-tuning for business/technical text

Strengths

High Accuracy: 94%+ on test set
Fast: <10ms per comparison
Domain-Optimized: Fine-tuned for AI/business claims
Context-Aware: Bidirectional attention captures context
Robust: Handles paraphrasing, negation, temporal claims

Limitations

Ambiguous Input: Unclear premise/hypothesis reduces accuracy
Subjective Claims: Opinion-based statements difficult to classify
Implicit Context: Requires explicit information
Domain Drift: Performs best on business/technical text
Entailment Chains: Doesn’t follow multi-step logical chains

Alert Thresholds

Convert NLI scores to alert severity:

Default Configuration

client.shield.update_nli_thresholds(
    contradictionThreshold=0.4,    # < 0.4 = contradiction
    neutralThreshold=0.6,          # 0.4-0.6 = neutral
    entailmentThreshold=0.85,      # > 0.85 = entailment
)

Alerts trigger when scores fall outside ranges:

Score < 0.4:  CRITICAL ─ Hallucination detected
0.4-0.6:      WARNING  ─ Unclear relationship, may need review
Score > 0.85: OK       ─ Content verified

Customizing Thresholds

Adjust sensitivity based on risk tolerance:

Strict Mode (low false negatives, high false positives):

client.shield.update_nli_thresholds(
    contradictionThreshold=0.5,
    neutralThreshold=0.7,
    entailmentThreshold=0.95
)
# Only accept very high confidence matches

Permissive Mode (high false negatives, low false positives):

client.shield.update_nli_thresholds(
    contradictionThreshold=0.2,
    neutralThreshold=0.5,
    entailmentThreshold=0.75
)
# Allow some ambiguity, fewer alerts

Common Score Patterns

Exact Match

Identical or nearly identical texts:

Premise: "TruthVouch costs $349/month"
Hypothesis: "TruthVouch costs $349/month"
Score: 0.99 (ENTAILMENT)

Paraphrase

Different wording, same meaning:

Premise: "TruthVouch was founded in 2023"
Hypothesis: "TruthVouch's founding year is 2023"
Score: 0.94 (ENTAILMENT)

Partial Match

Subset relationship:

Premise: "TruthVouch monitors 9+ LLM models"
Hypothesis: "TruthVouch monitors ChatGPT"
Score: 0.78 (Weak entailment, not guaranteed)

Negation

Opposite meaning:

Premise: "TruthVouch is cloud-based"
Hypothesis: "TruthVouch is on-premises"
Score: 0.08 (CONTRADICTION)

Temporal Shift

Different time period:

Premise: "Founded in 2023"
Hypothesis: "Founded in 2024"
Score: 0.12 (CONTRADICTION)

Unrelated

No semantic connection:

Premise: "TruthVouch monitors AI"
Hypothesis: "The Earth orbits the Sun"
Score: 0.51 (NEUTRAL)

Improving NLI Accuracy

1. Precise Truth Nuggets

Clear, specific nuggets improve matching:

# Poor: too vague
client.truth_nuggets.create(value="Good product")

# Good: specific and measurable
client.truth_nuggets.create(
    category="pricing",
    key="starter_tier",
    value="Starter plan: $349/month, includes basic features"
)

2. Query Variant Coverage

Multiple query types catch more hallucinations:

# System generates queries:
# - "What is TruthVouch's founding year?"
# - "When was TruthVouch founded?"
# - "TruthVouch founding: [year]?"
# - "Is TruthVouch older than 2020?"

3. Truth Nugget Versioning

Track updates and their context:

client.truth_nuggets.update(
    nugget_id="pricing_starter",
    value="$399/month",  # Changed from $349
    version="2.0",
    effective_date="2024-01-01",
    reason="Annual price increase"
)

4. Monitor Confidence

Review low-confidence detections:

low_conf = client.shield.get_alerts(
    min_confidence=0.5,
    max_confidence=0.7
)

for alert in low_conf:
    # Manually review and adjust thresholds if needed
    print(f"Review: {alert.claim}")

Performance Metrics

Speed

Per Comparison: <10ms
Batch Processing: 1000 comparisons in 8 seconds
Streaming: Sub-100ms latency for real-time checks

Throughput

API: 1000+ requests/second
Batch: 10,000+ comparisons per minute
Monitoring: Continuous checks on all models without lag

Cost

NLI scoring is included in all TruthVouch tiers (no additional cost).

Next Steps

Hallucination Detection: Learn the full detection pipeline
Alert Configuration: Customize alert thresholds for your needs
Truth Nuggets: Create precise nuggets for better detection
Monitoring: Set up continuous checks