Skip to content

Alert Severity & Scoring

Shield automatically classifies alerts into severity levels: Critical, High, Medium, or Low. Severity is based on impact (business risk, brand damage, compliance exposure) and confidence (how certain is the hallucination?).

Severity Levels

Critical (Action Required Immediately)

Definition: Severe impact; immediate action needed to prevent business damage.

Characteristics:

  • Financial or compliance impact (money, regulatory fine)
  • Visible to customers/public (brand damage risk)
  • Affects multiple systems or departments
  • Likelihood: 80%+ certainty hallucination occurred

Examples:

  • AI says “TruthVouch pricing is $5,000/month” (your actual: $500/month) → customer confusion
  • AI says “We’re SOX-compliant” (you’re not) → regulatory violation risk
  • AI gives medical advice on your site (not approved) → health/liability risk
  • “Our CEO said we’re hiring 200 people” (false; not announced) → stock manipulation concern

Response SLA: <1 hour (alert on-call team)

High (Investigate & Plan)

Definition: Significant impact; plan correction within hours.

Characteristics:

  • Business or brand impact (but time to fix measured in hours)
  • Affects key messages or competitive positioning
  • Likelihood: 70-79% certainty
  • Could impact customer decisions if left uncorrected

Examples:

  • AI claims feature exists (doesn’t; coming next quarter)
  • AI says company size is 500 (actual: 100) — competitive positioning
  • AI claims award/certification you don’t have
  • AI says partnership with competitor (untrue)

Response SLA: <4 hours

Medium (Plan & Schedule)

Definition: Noticeable impact; plan correction within 24 hours.

Characteristics:

  • Moderate brand or operational impact
  • Affects secondary messages or details
  • Likelihood: 60-69% certainty
  • Easily clarified if corrected quickly

Examples:

  • AI gets founding date wrong (says 2022, actually 2021)
  • AI describes product feature in outdated way (feature exists, description wrong)
  • AI claims office location (wrong; office closed)
  • Statistics outdated (Q3 numbers, but now Q4)

Response SLA: <24 hours

Low (Monitor & Resolve)

Definition: Minimal impact; resolve when convenient.

Characteristics:

  • Minor inaccuracy with low business impact
  • Unlikely to affect decisions
  • Likelihood: 50-59% certainty (borderline)
  • Can be left as-is without significant risk

Examples:

  • AI says you have “5 products” (actually 6, but minor)
  • Minor name inconsistency (“Truth Vouch” vs “TruthVouch”)
  • Outdated example or case study (illustrative, not critical)
  • Vague claim that’s “mostly correct”

Response SLA: <1 week (batch corrections weekly)

Severity Calculation

Severity = f(Impact, Confidence, Visibility)

Impact Score (0-100)

How much business risk does the hallucination create?

FactorPointsReasoning
Financial Impact
Direct revenue loss40Wrong pricing, broken payment
Customer churn risk30Causes customer to leave
Regulatory fine risk35Compliance/legal exposure
No direct cost0Clarification only needed
Brand/Reputation
Public visibility (social media, news)+25Mass audience sees it
Customer-facing (website, support)+15Customers might see it
Internal use only0Limited exposure
Decision-Affecting
Influences purchase decision+20Could change customer behavior
Affects trust (credibility)+15Harms reputation if wrong
Informational only0Doesn’t drive decisions

Example calculations:

  • “AI says pricing $5K” (wrong: $500) → Financial (40) + Public (25) + Purchase Decision (20) = 85 impact
  • “AI says founding date is 2022” (wrong: 2021) → None of above = 0 impact (minor)
  • “AI claims SOX-compliant” (you’re not) → Regulatory (35) + Public (25) + Trust (15) = 75 impact

Confidence Score (0-100)

How confident is Shield that a hallucination occurred?

Method 1: Natural Language Inference (NLI)

  • Compare AI response to Truth Nugget
  • Measure semantic contradiction strength
  • Higher score = stronger contradiction
  • Range: 50-99%

Example:

  • AI: “Founded in 2022”; Nugget: “Founded in 2021” → 85% confidence (clear factual contradiction)
  • AI: “Strong innovation culture”; Nugget: “Focus on reliability” → 70% confidence (cultural nuance, softer contradiction)

Method 2: Cross-Check Confidence

  • Query multiple AI engines (ChatGPT, Claude, Gemini)
  • If all agree AI said X but Truth Nugget says Y → higher confidence
  • If engines disagree with each other → lower confidence

Example:

  • All 3 engines say “$50M funding” but Nugget says “$40M” → 90% confidence
  • 2 engines say “$50M”, 1 says “$40M”, Nugget says “$40M” → 70% confidence

Method 3: Source Authority

  • If Truth Nugget sourced from official docs (earnings report, SEC filing) → higher confidence
  • If sourced from team knowledge → moderate confidence
  • If based on assumption → lower confidence

Example:

  • Nugget from SEC 10-Q: 95% confidence
  • Nugget from internal wiki: 75% confidence
  • Nugget from “we think…”: 60% confidence

Visibility Score (0-100)

Who sees the AI’s response?

ChannelScoreReasoning
Public web/social100Millions of people
Customer-facing product80All customers see it
Internal use only20Employees only
Trusted environment (sandbox)10Limited scope

Severity Matrix

Severity = Impact + Confidence + Visibility (weighted)

Critical (≥70):
Impact ≥50 AND Confidence ≥75 AND Visibility ≥60
OR Impact ≥80 (regardless of confidence/visibility)
High (50-69):
(Impact ≥40 AND Confidence ≥70) OR (Impact ≥60 AND Confidence ≥50)
Medium (35-49):
(Impact ≥25 AND Confidence ≥60) OR (Impact ≥40 AND Confidence ≥40)
Low (<35):
Default for lower-impact items

Examples

Example 1: Wrong Pricing (Critical)

  • AI said: “TruthVouch costs $5,000/month”
  • Truth Nugget: “Standard plan is $500/month”
  • Impact: 85 (direct revenue loss + customer confusion + purchase decision)
  • Confidence: 92 (factual contradiction; confirmed by pricing page)
  • Visibility: 95 (on public website)
  • Severity: CRITICAL ✓

Example 2: Founding Date Wrong (Low)

  • AI said: “Founded in 2023”
  • Truth Nugget: “Founded in 2021”
  • Impact: 5 (minor factual error; doesn’t drive decisions)
  • Confidence: 88 (clear factual contradiction)
  • Visibility: 30 (mentioned in background, not decision-affecting)
  • Severity: LOW ✓

Example 3: Feature Doesn’t Exist Yet (High)

  • AI said: “Shield includes automated corrections for all hallucinations”
  • Truth Nugget: “Manual approval required for all corrections” (feature coming Q2 2024)
  • Impact: 65 (customer expectation mismatch; support cost)
  • Confidence: 95 (documented feature limitation)
  • Visibility: 80 (customer facing)
  • Severity: HIGH ✓

Custom Severity Rules

Customize how Shield calculates severity for your org:

Example Rule 1: Financial Warnings Extra Critical

“Any hallucination about our financial metrics should be Critical, regardless of impact/confidence”

  • Hallucination about revenue, funding, growth = Auto-Critical
  • Reason: Financial misleading is high regulatory risk

Example Rule 2: Competitor Claims High Priority

“Claims about competitors should be elevated one level (Low → Medium, Medium → High, High → Critical)”

  • Reason: Competitor false claims affect market perception

Example Rule 3: Lower Sensitivity for Internal Use

“Decrease severity for internal-use-only hallucinations (reduce by 1 level)”

  • Reason: Limited brand exposure; time to fix not urgent

To create custom rules:

  1. Go to Shield SettingsSeverity Rules
  2. Click Create Rule
  3. Define trigger (fact category, visibility, impact threshold)
  4. Set action (override severity to X)
  5. Save

Handling Borderline Cases

What if impact/confidence score lands between levels?

Shield uses conservative approach:

  • Round UP in severity if uncertain
  • Better to over-alert than under-alert
  • You can downgrade manually if false positive

Examples:

  • Impact 58 (High-Medium boundary) → Classify as High
  • Confidence 72 (High-Medium boundary) → Classify as High

Next Steps

  1. Review severity classifications — Do they align with your business?
  2. Set response SLAs — How quickly must each severity be reviewed?
  3. Create custom rules — Any overrides for your org?
  4. Configure routing — Route by severity to appropriate teams
  5. Monitor for accuracy — Are severity levels assigned correctly?