Monitoring AI Engines

Brand Intelligence monitors 9+ AI engines for brand accuracy. Each engine has different training data, model architectures, and response patterns — so they each tell a different story about your brand.

Supported Engines

ChatGPT (OpenAI)

Models monitored: 4, 4o, o1

Market share: 65% of enterprise AI usage Training data cutoff: April 2024 (GPT-4o), varies by model Key quirks:

Most likely to hallucinate recent information (training data is older)
Web browsing in ChatGPT sometimes includes outdated cached content
Very sensitive to framing — different prompts can produce very different answers

What we monitor:

Direct product/company questions
Market positioning claims
Leadership information
Pricing and features
Historical facts about your company

Typical questions:

“What does [company] do?”
“Who are [company]‘s competitors?”
“What is [product]‘s pricing?”
“When was [company] founded?”

Claude (Anthropic)

Models monitored: 3, 3.5 Sonnet, 3.5 Haiku, 4

Market share: 20% enterprise, 35% in developer tools Training data cutoff: April 2024 (Claude 3.5), varies by model Key quirks:

More conservative — refuses some queries other models answer
Better at citing sources in responses
Strong on technical accuracy (better than GPT for engineering claims)
Longer context window means more detailed responses

What we monitor:

Technical capability claims
Integration and API documentation references
Engineering talent/credentials
Compliance and security certifications

Typical questions:

“Is [company] built on strong engineering principles?”
“What are [product]‘s API capabilities?”
“Does [company] comply with SOC 2?”

Google Gemini

Models monitored: Gemini 1.5 Pro, Flash

Market share: 10% enterprise, integrated into Gmail/Workspace Training data cutoff: Late 2023 (varies by model) Key quirks:

Often returns very different results day-to-day (dynamic search integration)
Excellent at web search integration — tends to have more recent info than ChatGPT
More verbose responses
Sometimes over-confident in citations

What we monitor:

Recent news or announcements about your company
Blog mentions and press coverage
Product reviews and comparisons
Market trends involving your category

Typical questions:

“What’s new with [company]?”
“What do people say about [product]?”
“Is [company] growing?”

Perplexity AI

Models monitored: Perplexity Pro (Claude-based), Perplexity Free (experimental)

Market share: 5% but rapidly growing in research/analyst use cases Training data cutoff: Real-time web search Key quirks:

Most recent training data of any model (performs real-time search)
High accuracy for current events
Excellent at providing sources
Heavy reliance on search results can sometimes include misinformation from forums

What we monitor:

Real-time brand mentions across web
Recent product announcements
Industry news involving your company
Analyst coverage and reviews

Typical questions:

“What’s [company] doing this week?”
“Where is [company] mentioned in the news?”
“What do industry analysts say about [company]?”

Microsoft Copilot

Models monitored: Enterprise (GPT-4 Turbo), Consumer (GPT-4o)

Market share: 25% within enterprises (Office integration), growing rapidly Training data cutoff: Varies by model, often newer than ChatGPT Key quirks:

Integrated with Microsoft products (Word, Excel, Teams) — sees different context
Enterprise version has access to company’s own data
Sometimes references Microsoft partnerships more heavily

What we monitor:

Enterprise software claims (integrations, compatibility)
Microsoft partnership mentions
B2B positioning
Technical documentation references

Typical questions:

“Does [product] integrate with Microsoft Teams?”
“Is [company] a Microsoft partner?”
“Can [product] run on Azure?”

Other Engines (Coming Soon)

We’re adding monitoring for:

Amazon Q — AWS integration focus
Hugging Face Chat — Open-source focused discussions
Mistral Chat — European regulations, privacy-focused
Poe — Multi-engine aggregator
And more as new engines become mainstream

How We Monitor Each Engine

Query Process

Formula-based queries: We ask 20-40 predefined questions about your brand for consistency
Structured prompts: Each question is asked 2-3 times with slight variations to test reliability
Fresh queries: Each monitoring cycle, we start from scratch (no chat history)
Consistent user agent: All queries appear from the same “user” type to avoid variation

Scoring

Each response is evaluated against your truth nuggets:

Accuracy Score = (Correctly verified claims / Total claims in response) × 100

Verification happens through:

Exact string matching (e.g., “Founded in 2020” matches “Founded in 2020”)
Semantic similarity (e.g., “Founded in early 2020” matches “Founded in 2020”)
Range matching (e.g., “500-600 employees” matches “550 employees”)
Date tolerance (e.g., “In 2020” matches “March 2020”)

Response Handling

What happens to each response:

Claims extraction: NER (Named Entity Recognition) extracts factual claims
Truth nugget matching: Each claim is compared to your verified facts
Scoring: Accuracy is calculated per-engine, per-query
Trend tracking: Week-over-week changes are monitored
Alert generation: Critical inaccuracies trigger alerts

Engine-Specific Recommendations

If Your Score is High Across All Engines

You have excellent brand representation. Focus on maintaining currency:

Update truth nuggets when things change
Monitor narratives for emerging issues
Quarterly deep dives into negative narratives

If Your Score is High in Some, Low in Others

Common patterns:

ChatGPT low, Gemini/Perplexity high: Your website is recent/updated but ChatGPT’s training data is old. No immediate action needed — ChatGPT will catch up at next model update.
Perplexity low, others high: Your information isn’t in search results yet. Usually means very recent changes. Add more links or press releases to get indexed.
Copilot low, others high: Your info isn’t in Microsoft-accessible sources. Check if you’re listed in directories Microsoft crawls.

If Your Score is Low Across All Engines

This suggests either:

Your truth nuggets aren’t well-represented on your website
Your website content is outdated
Competing information exists in AI training data

Action:

Review your top 3 “missing information” alerts
Check if those facts are clearly on your website
Add them if missing, or rewrite existing content for clarity
Wait 1-2 weeks for AI models to re-crawl

Monitoring Frequency

By default, each engine is monitored weekly. You can customize:

Daily: Better for critical brands (health/safety impact) or during high-visibility events
Weekly: Standard frequency, best for most companies
Monthly: Cost-effective for non-critical brands
Custom: Define your own schedule

Higher frequency = higher API costs, but faster detection of inaccuracies.

Engine Comparison

Engine	Recency	Reliability	Popularity	Coverage
ChatGPT	Good	Excellent	Highest	Broad
Claude	Good	Excellent	High	Technical focus
Gemini	Very Good	Good	Medium	Broad
Perplexity	Excellent	Good	Growing	Current events
Copilot	Good	Good	Growing	Enterprise

Next Steps

Accuracy Score Deep Dive → — Understand how scores are calculated
Brand Dashboard → — See per-engine scores in real-time
Narrative Tracking → — Track emerging narratives across engines
GEO Optimization → — Improve visibility across all engines