Alert Severity & Scoring

Shield automatically classifies alerts into severity levels: Critical, High, Medium, or Low. Severity is based on impact (business risk, brand damage, compliance exposure) and confidence (how certain is the hallucination?).

Severity Levels

Critical (Action Required Immediately)

Definition: Severe impact; immediate action needed to prevent business damage.

Characteristics:

Financial or compliance impact (money, regulatory fine)
Visible to customers/public (brand damage risk)
Affects multiple systems or departments
Likelihood: 80%+ certainty hallucination occurred

Examples:

AI says “TruthVouch pricing is $5,000/month” (your actual: $500/month) → customer confusion
AI says “We’re SOX-compliant” (you’re not) → regulatory violation risk
AI gives medical advice on your site (not approved) → health/liability risk
“Our CEO said we’re hiring 200 people” (false; not announced) → stock manipulation concern

Response SLA: <1 hour (alert on-call team)

High (Investigate & Plan)

Definition: Significant impact; plan correction within hours.

Characteristics:

Business or brand impact (but time to fix measured in hours)
Affects key messages or competitive positioning
Likelihood: 70-79% certainty
Could impact customer decisions if left uncorrected

Examples:

AI claims feature exists (doesn’t; coming next quarter)
AI says company size is 500 (actual: 100) — competitive positioning
AI claims award/certification you don’t have
AI says partnership with competitor (untrue)

Response SLA: <4 hours

Medium (Plan & Schedule)

Definition: Noticeable impact; plan correction within 24 hours.

Characteristics:

Moderate brand or operational impact
Affects secondary messages or details
Likelihood: 60-69% certainty
Easily clarified if corrected quickly

Examples:

AI gets founding date wrong (says 2022, actually 2021)
AI describes product feature in outdated way (feature exists, description wrong)
AI claims office location (wrong; office closed)
Statistics outdated (Q3 numbers, but now Q4)

Response SLA: <24 hours

Low (Monitor & Resolve)

Definition: Minimal impact; resolve when convenient.

Characteristics:

Minor inaccuracy with low business impact
Unlikely to affect decisions
Likelihood: 50-59% certainty (borderline)
Can be left as-is without significant risk

Examples:

AI says you have “5 products” (actually 6, but minor)
Minor name inconsistency (“Truth Vouch” vs “TruthVouch”)
Outdated example or case study (illustrative, not critical)
Vague claim that’s “mostly correct”

Response SLA: <1 week (batch corrections weekly)

Severity Calculation

Severity = f(Impact, Confidence, Visibility)

Impact Score (0-100)

How much business risk does the hallucination create?

Factor	Points	Reasoning
Financial Impact
Direct revenue loss	40	Wrong pricing, broken payment
Customer churn risk	30	Causes customer to leave
Regulatory fine risk	35	Compliance/legal exposure
No direct cost	0	Clarification only needed
Brand/Reputation
Public visibility (social media, news)	+25	Mass audience sees it
Customer-facing (website, support)	+15	Customers might see it
Internal use only	0	Limited exposure
Decision-Affecting
Influences purchase decision	+20	Could change customer behavior
Affects trust (credibility)	+15	Harms reputation if wrong
Informational only	0	Doesn’t drive decisions

Example calculations:

“AI says pricing $5K” (wrong: $500) → Financial (40) + Public (25) + Purchase Decision (20) = 85 impact
“AI says founding date is 2022” (wrong: 2021) → None of above = 0 impact (minor)
“AI claims SOX-compliant” (you’re not) → Regulatory (35) + Public (25) + Trust (15) = 75 impact

Confidence Score (0-100)

How confident is Shield that a hallucination occurred?

Method 1: Natural Language Inference (NLI)

Compare AI response to Truth Nugget
Measure semantic contradiction strength
Higher score = stronger contradiction
Range: 50-99%

Example:

AI: “Founded in 2022”; Nugget: “Founded in 2021” → 85% confidence (clear factual contradiction)
AI: “Strong innovation culture”; Nugget: “Focus on reliability” → 70% confidence (cultural nuance, softer contradiction)

Method 2: Cross-Check Confidence

Query multiple AI engines (ChatGPT, Claude, Gemini)
If all agree AI said X but Truth Nugget says Y → higher confidence
If engines disagree with each other → lower confidence

Example:

All 3 engines say “$50M funding” but Nugget says “$40M” → 90% confidence
2 engines say “$50M”, 1 says “$40M”, Nugget says “$40M” → 70% confidence

Method 3: Source Authority

If Truth Nugget sourced from official docs (earnings report, SEC filing) → higher confidence
If sourced from team knowledge → moderate confidence
If based on assumption → lower confidence

Example:

Nugget from SEC 10-Q: 95% confidence
Nugget from internal wiki: 75% confidence
Nugget from “we think…”: 60% confidence

Visibility Score (0-100)

Who sees the AI’s response?

Channel	Score	Reasoning
Public web/social	100	Millions of people
Customer-facing product	80	All customers see it
Internal use only	20	Employees only
Trusted environment (sandbox)	10	Limited scope

Severity Matrix

Severity = Impact + Confidence + Visibility (weighted)

Critical (≥70):
  Impact ≥50 AND Confidence ≥75 AND Visibility ≥60
  OR Impact ≥80 (regardless of confidence/visibility)

High (50-69):
  (Impact ≥40 AND Confidence ≥70) OR (Impact ≥60 AND Confidence ≥50)

Medium (35-49):
  (Impact ≥25 AND Confidence ≥60) OR (Impact ≥40 AND Confidence ≥40)

Low (<35):
  Default for lower-impact items

Examples

Example 1: Wrong Pricing (Critical)

AI said: “TruthVouch costs $5,000/month”
Truth Nugget: “Standard plan is $500/month”
Impact: 85 (direct revenue loss + customer confusion + purchase decision)
Confidence: 92 (factual contradiction; confirmed by pricing page)
Visibility: 95 (on public website)
Severity: CRITICAL ✓

Example 2: Founding Date Wrong (Low)

AI said: “Founded in 2023”
Truth Nugget: “Founded in 2021”
Impact: 5 (minor factual error; doesn’t drive decisions)
Confidence: 88 (clear factual contradiction)
Visibility: 30 (mentioned in background, not decision-affecting)
Severity: LOW ✓

Example 3: Feature Doesn’t Exist Yet (High)

AI said: “Shield includes automated corrections for all hallucinations”
Truth Nugget: “Manual approval required for all corrections” (feature coming Q2 2024)
Impact: 65 (customer expectation mismatch; support cost)
Confidence: 95 (documented feature limitation)
Visibility: 80 (customer facing)
Severity: HIGH ✓

Custom Severity Rules

Customize how Shield calculates severity for your org:

Example Rule 1: Financial Warnings Extra Critical

“Any hallucination about our financial metrics should be Critical, regardless of impact/confidence”

Hallucination about revenue, funding, growth = Auto-Critical
Reason: Financial misleading is high regulatory risk

Example Rule 2: Competitor Claims High Priority

“Claims about competitors should be elevated one level (Low → Medium, Medium → High, High → Critical)”

Reason: Competitor false claims affect market perception

Example Rule 3: Lower Sensitivity for Internal Use

“Decrease severity for internal-use-only hallucinations (reduce by 1 level)”

Reason: Limited brand exposure; time to fix not urgent

To create custom rules:

Go to Shield Settings → Severity Rules
Click Create Rule
Define trigger (fact category, visibility, impact threshold)
Set action (override severity to X)
Save

Handling Borderline Cases

What if impact/confidence score lands between levels?

Shield uses conservative approach:

Round UP in severity if uncertain
Better to over-alert than under-alert
You can downgrade manually if false positive

Examples:

Impact 58 (High-Medium boundary) → Classify as High
Confidence 72 (High-Medium boundary) → Classify as High

Alert Channels — Route alerts based on severity
Alert Workflows — Escalation based on severity
Corrections — Approval workflow varies by severity

Next Steps

Review severity classifications — Do they align with your business?
Set response SLAs — How quickly must each severity be reviewed?
Create custom rules — Any overrides for your org?
Configure routing — Route by severity to appropriate teams
Monitor for accuracy — Are severity levels assigned correctly?

Alert Severity & Scoring

Severity Levels

Critical (Action Required Immediately)

High (Investigate & Plan)

Medium (Plan & Schedule)

Low (Monitor & Resolve)

Severity Calculation

Impact Score (0-100)

Confidence Score (0-100)

Visibility Score (0-100)

Severity Matrix

Examples

Example 1: Wrong Pricing (Critical)

Example 2: Founding Date Wrong (Low)

Example 3: Feature Doesn’t Exist Yet (High)

Custom Severity Rules

Example Rule 1: Financial Warnings Extra Critical

Example Rule 2: Competitor Claims High Priority

Example Rule 3: Lower Sensitivity for Internal Use

Handling Borderline Cases

Related Topics

Next Steps