Alert Severity & Scoring
Shield automatically classifies alerts into severity levels: Critical, High, Medium, or Low. Severity is based on impact (business risk, brand damage, compliance exposure) and confidence (how certain is the hallucination?).
Severity Levels
Critical (Action Required Immediately)
Definition: Severe impact; immediate action needed to prevent business damage.
Characteristics:
- Financial or compliance impact (money, regulatory fine)
- Visible to customers/public (brand damage risk)
- Affects multiple systems or departments
- Likelihood: 80%+ certainty hallucination occurred
Examples:
- AI says “TruthVouch pricing is $5,000/month” (your actual: $500/month) → customer confusion
- AI says “We’re SOX-compliant” (you’re not) → regulatory violation risk
- AI gives medical advice on your site (not approved) → health/liability risk
- “Our CEO said we’re hiring 200 people” (false; not announced) → stock manipulation concern
Response SLA: <1 hour (alert on-call team)
High (Investigate & Plan)
Definition: Significant impact; plan correction within hours.
Characteristics:
- Business or brand impact (but time to fix measured in hours)
- Affects key messages or competitive positioning
- Likelihood: 70-79% certainty
- Could impact customer decisions if left uncorrected
Examples:
- AI claims feature exists (doesn’t; coming next quarter)
- AI says company size is 500 (actual: 100) — competitive positioning
- AI claims award/certification you don’t have
- AI says partnership with competitor (untrue)
Response SLA: <4 hours
Medium (Plan & Schedule)
Definition: Noticeable impact; plan correction within 24 hours.
Characteristics:
- Moderate brand or operational impact
- Affects secondary messages or details
- Likelihood: 60-69% certainty
- Easily clarified if corrected quickly
Examples:
- AI gets founding date wrong (says 2022, actually 2021)
- AI describes product feature in outdated way (feature exists, description wrong)
- AI claims office location (wrong; office closed)
- Statistics outdated (Q3 numbers, but now Q4)
Response SLA: <24 hours
Low (Monitor & Resolve)
Definition: Minimal impact; resolve when convenient.
Characteristics:
- Minor inaccuracy with low business impact
- Unlikely to affect decisions
- Likelihood: 50-59% certainty (borderline)
- Can be left as-is without significant risk
Examples:
- AI says you have “5 products” (actually 6, but minor)
- Minor name inconsistency (“Truth Vouch” vs “TruthVouch”)
- Outdated example or case study (illustrative, not critical)
- Vague claim that’s “mostly correct”
Response SLA: <1 week (batch corrections weekly)
Severity Calculation
Severity = f(Impact, Confidence, Visibility)
Impact Score (0-100)
How much business risk does the hallucination create?
| Factor | Points | Reasoning |
|---|---|---|
| Financial Impact | ||
| Direct revenue loss | 40 | Wrong pricing, broken payment |
| Customer churn risk | 30 | Causes customer to leave |
| Regulatory fine risk | 35 | Compliance/legal exposure |
| No direct cost | 0 | Clarification only needed |
| Brand/Reputation | ||
| Public visibility (social media, news) | +25 | Mass audience sees it |
| Customer-facing (website, support) | +15 | Customers might see it |
| Internal use only | 0 | Limited exposure |
| Decision-Affecting | ||
| Influences purchase decision | +20 | Could change customer behavior |
| Affects trust (credibility) | +15 | Harms reputation if wrong |
| Informational only | 0 | Doesn’t drive decisions |
Example calculations:
- “AI says pricing $5K” (wrong: $500) → Financial (40) + Public (25) + Purchase Decision (20) = 85 impact
- “AI says founding date is 2022” (wrong: 2021) → None of above = 0 impact (minor)
- “AI claims SOX-compliant” (you’re not) → Regulatory (35) + Public (25) + Trust (15) = 75 impact
Confidence Score (0-100)
How confident is Shield that a hallucination occurred?
Method 1: Natural Language Inference (NLI)
- Compare AI response to Truth Nugget
- Measure semantic contradiction strength
- Higher score = stronger contradiction
- Range: 50-99%
Example:
- AI: “Founded in 2022”; Nugget: “Founded in 2021” → 85% confidence (clear factual contradiction)
- AI: “Strong innovation culture”; Nugget: “Focus on reliability” → 70% confidence (cultural nuance, softer contradiction)
Method 2: Cross-Check Confidence
- Query multiple AI engines (ChatGPT, Claude, Gemini)
- If all agree AI said X but Truth Nugget says Y → higher confidence
- If engines disagree with each other → lower confidence
Example:
- All 3 engines say “$50M funding” but Nugget says “$40M” → 90% confidence
- 2 engines say “$50M”, 1 says “$40M”, Nugget says “$40M” → 70% confidence
Method 3: Source Authority
- If Truth Nugget sourced from official docs (earnings report, SEC filing) → higher confidence
- If sourced from team knowledge → moderate confidence
- If based on assumption → lower confidence
Example:
- Nugget from SEC 10-Q: 95% confidence
- Nugget from internal wiki: 75% confidence
- Nugget from “we think…”: 60% confidence
Visibility Score (0-100)
Who sees the AI’s response?
| Channel | Score | Reasoning |
|---|---|---|
| Public web/social | 100 | Millions of people |
| Customer-facing product | 80 | All customers see it |
| Internal use only | 20 | Employees only |
| Trusted environment (sandbox) | 10 | Limited scope |
Severity Matrix
Severity = Impact + Confidence + Visibility (weighted)
Critical (≥70): Impact ≥50 AND Confidence ≥75 AND Visibility ≥60 OR Impact ≥80 (regardless of confidence/visibility)
High (50-69): (Impact ≥40 AND Confidence ≥70) OR (Impact ≥60 AND Confidence ≥50)
Medium (35-49): (Impact ≥25 AND Confidence ≥60) OR (Impact ≥40 AND Confidence ≥40)
Low (<35): Default for lower-impact itemsExamples
Example 1: Wrong Pricing (Critical)
- AI said: “TruthVouch costs $5,000/month”
- Truth Nugget: “Standard plan is $500/month”
- Impact: 85 (direct revenue loss + customer confusion + purchase decision)
- Confidence: 92 (factual contradiction; confirmed by pricing page)
- Visibility: 95 (on public website)
- Severity: CRITICAL ✓
Example 2: Founding Date Wrong (Low)
- AI said: “Founded in 2023”
- Truth Nugget: “Founded in 2021”
- Impact: 5 (minor factual error; doesn’t drive decisions)
- Confidence: 88 (clear factual contradiction)
- Visibility: 30 (mentioned in background, not decision-affecting)
- Severity: LOW ✓
Example 3: Feature Doesn’t Exist Yet (High)
- AI said: “Shield includes automated corrections for all hallucinations”
- Truth Nugget: “Manual approval required for all corrections” (feature coming Q2 2024)
- Impact: 65 (customer expectation mismatch; support cost)
- Confidence: 95 (documented feature limitation)
- Visibility: 80 (customer facing)
- Severity: HIGH ✓
Custom Severity Rules
Customize how Shield calculates severity for your org:
Example Rule 1: Financial Warnings Extra Critical
“Any hallucination about our financial metrics should be Critical, regardless of impact/confidence”
- Hallucination about revenue, funding, growth = Auto-Critical
- Reason: Financial misleading is high regulatory risk
Example Rule 2: Competitor Claims High Priority
“Claims about competitors should be elevated one level (Low → Medium, Medium → High, High → Critical)”
- Reason: Competitor false claims affect market perception
Example Rule 3: Lower Sensitivity for Internal Use
“Decrease severity for internal-use-only hallucinations (reduce by 1 level)”
- Reason: Limited brand exposure; time to fix not urgent
To create custom rules:
- Go to Shield Settings → Severity Rules
- Click Create Rule
- Define trigger (fact category, visibility, impact threshold)
- Set action (override severity to X)
- Save
Handling Borderline Cases
What if impact/confidence score lands between levels?
Shield uses conservative approach:
- Round UP in severity if uncertain
- Better to over-alert than under-alert
- You can downgrade manually if false positive
Examples:
- Impact 58 (High-Medium boundary) → Classify as High
- Confidence 72 (High-Medium boundary) → Classify as High
Related Topics
- Alert Channels — Route alerts based on severity
- Alert Workflows — Escalation based on severity
- Corrections — Approval workflow varies by severity
Next Steps
- Review severity classifications — Do they align with your business?
- Set response SLAs — How quickly must each severity be reviewed?
- Create custom rules — Any overrides for your org?
- Configure routing — Route by severity to appropriate teams
- Monitor for accuracy — Are severity levels assigned correctly?