Skip to content

Understanding Alerts

Alerts automatically notify you when Shield detects hallucinations. Each alert represents a potential brand risk that may require action.

Shield Alerts showing detected hallucinations with severity levels

What Triggers an Alert

Shield triggers an alert when:

  1. Cross-check completes with truth score below your threshold (default: 80)
  2. AI response contradicts or diverges significantly from your Truth Nugget
  3. The discrepancy represents a brand or business risk

Alert Threshold (configurable):

  • Critical: Score <60 (always alert)
  • High: Score 60-79 (alert by default)
  • Medium: Score 40-59 (optional alert)
  • Low: Score <40 (never alert by default)

Adjust thresholds in Settings → Alerts → Sensitivity.

Alert Lifecycle

Each alert follows a workflow:

New → Acknowledged → Investigating → Resolved/Dismissed

New

Alert just created. Requires attention.

  • Click to review details
  • Can approve correction, dismiss, or investigate

Acknowledged

You’ve reviewed it and plan to act.

  • Shows team member assigned (optional)
  • Ticket created in issue tracker (if integrated)

Investigating

Assigned to team member for research.

  • Notes can be added
  • Status updates tracked

Resolved

Correction deployed and verified, or cause identified and handled.

  • Terminal state
  • Full history retained

Dismissed

You’ve determined it’s not a true hallucination.

  • Reasons: “Paraphrase OK”, “Outdated fact”, “False positive”, “Not important”
  • Shield learns from dismissals

Alert Details

Click any alert to see:

Summary

  • Alert ID and creation time
  • Severity (critical, high, medium, low)
  • AI engine and model
  • Truth Nugget involved

Full Comparison

Your Truth: "Founded in 2024"
AI Said: "Founded in early 2023"
Entities: Date: 2023 (vs your 2024)
NLI Verdict: CONTRADICTED (96% confidence)
Truth Score: 15/100

AI Response Full text of what the AI generated.

Suggested Correction Auto-generated fix (if applicable):

  • For product facts: “Update your website to mention the feature”
  • For pricing: “Publish correct pricing to your pricing page”
  • For people: “Clarify in bio or press materials”

Audit Trail

  • When detected
  • By which query
  • Who’s assigned (if any)
  • Notes added

Alert Severity

Critical (Score <60)

Major hallucination or contradiction.

Examples:

  • “Company shut down” (completely false)
  • “CEO is wrong person” (identity error)
  • “Product does X instead of Y” (wrong capability)
  • Price off by 10x

Action: Fix immediately. High brand damage.

High (Score 60-79)

Significant inaccuracy affecting perception.

Examples:

  • “Price is $200/month” (you say $349)
  • “Founded in 2020” (you say 2024)
  • “Monitors 5 engines” (you say 9)

Action: Fix within 24 hours.

Medium (Score 40-59)

Partial information or minor discrepancy.

Examples:

  • “Has some AI safety features” (vague vs your detailed list)
  • “Has thousands of customers” (you say “500+”)
  • Doesn’t mention key differentiator

Action: Fix within 48 hours, or update fact if it’s ambiguous.

Low (Score <40)

Minor misunderstanding unlikely to affect decisions.

Examples:

  • “Has a new product” (you say “launching soon”)
  • Name slightly misspelled or colloquialized
  • Missing non-critical detail

Action: Optional. Fix if you have time, or mark as false positive.

Managing Alerts

Dismiss Alert

Mark as not a true hallucination:

  • Click alert → Dismiss → Choose reason

Reasons:

  • “Paraphrase OK” — AI said something different but equivalent
  • “Outdated fact” — Your Truth Nugget is stale, not the AI
  • “False positive” — Shield’s detection was wrong
  • “Not a risk” — Inaccuracy exists but doesn’t matter
  • “Will fix separately” — Not via correction

Shield learns from dismissals to reduce future false positives.

Approve Correction

Shield suggests a fix; you approve it:

  • Click alert → Approve Correction → Choose method
  • Correction deploys within seconds
  • Shield re-polls in 24-72 hours to verify

Edit Truth Nugget

If the alert reveals your fact is wrong:

  • Click alert → Edit Nugget
  • Update fact text, confidence, or expiry
  • Save (Shield re-scores immediately)
  • Alert may auto-resolve if score improves

Assign to Team Member

Delegate investigation:

  • Click alert → Assign
  • Choose team member and due date
  • They get notified
  • Slack/email notification sent

Add Note

Document investigation findings:

  • Click alert → Add Note
  • Type investigation details
  • Visible to team and in audit trail

Filtering Alerts

Go to Shield → Alerts to see all alerts, filtered by:

  • Status: New, Acknowledged, Investigating, Resolved, Dismissed
  • Severity: Critical, High, Medium, Low
  • Engine: ChatGPT, Claude, Gemini, Perplexity, etc.
  • Category: Product, Financial, Leadership, etc.
  • Time: Last 24h, 7d, 30d, custom range
  • Assigned: Unassigned, assigned to me, assigned to specific person

Common Views

Action Items (unsolved):

  • Status: New, Acknowledged, Investigating
  • Severity: Critical, High

Recently Resolved:

  • Status: Resolved
  • Time: Last 7 days

False Positives:

  • Status: Dismissed
  • Reason: False positive

Alert Notifications

Choose how to be notified:

Email: Digest with summary

  • Critical → immediately
  • High → morning (8 AM)
  • Medium → daily (5 PM)
  • Low → weekly (Sunday 5 PM)

Slack: Real-time messages

  • Critical → @channel mention in #security
  • High → @you in thread
  • Medium/Low → disabled

Teams: Direct messages

  • Critical → urgent flag
  • High → normal
  • Medium/Low → daily digest

PagerDuty: On-call escalation

  • Critical → page on-call
  • High/Medium → auto-incident, no page
  • Low → disabled

Configure in Settings → Notifications.

Bulk Operations

Select multiple alerts to act on together:

  1. Click checkboxes on multiple alerts
  2. Actions appear at top:
    • Mark as Resolved
    • Approve Corrections (one by one)
    • Assign to Person
    • Add Tag
    • Export to CSV

Example: “Select all ChatGPT pricing alerts → Approve Corrections → all deploy at once”

Next Steps