Monitoring AI Engines
Brand Intelligence monitors 9+ AI engines for brand accuracy. Each engine has different training data, model architectures, and response patterns — so they each tell a different story about your brand.
Supported Engines
ChatGPT (OpenAI)
Models monitored: 4, 4o, o1
Market share: 65% of enterprise AI usage Training data cutoff: April 2024 (GPT-4o), varies by model Key quirks:
- Most likely to hallucinate recent information (training data is older)
- Web browsing in ChatGPT sometimes includes outdated cached content
- Very sensitive to framing — different prompts can produce very different answers
What we monitor:
- Direct product/company questions
- Market positioning claims
- Leadership information
- Pricing and features
- Historical facts about your company
Typical questions:
- “What does [company] do?”
- “Who are [company]‘s competitors?”
- “What is [product]‘s pricing?”
- “When was [company] founded?”
Claude (Anthropic)
Models monitored: 3, 3.5 Sonnet, 3.5 Haiku, 4
Market share: 20% enterprise, 35% in developer tools Training data cutoff: April 2024 (Claude 3.5), varies by model Key quirks:
- More conservative — refuses some queries other models answer
- Better at citing sources in responses
- Strong on technical accuracy (better than GPT for engineering claims)
- Longer context window means more detailed responses
What we monitor:
- Technical capability claims
- Integration and API documentation references
- Engineering talent/credentials
- Compliance and security certifications
Typical questions:
- “Is [company] built on strong engineering principles?”
- “What are [product]‘s API capabilities?”
- “Does [company] comply with SOC 2?”
Google Gemini
Models monitored: Gemini 1.5 Pro, Flash
Market share: 10% enterprise, integrated into Gmail/Workspace Training data cutoff: Late 2023 (varies by model) Key quirks:
- Often returns very different results day-to-day (dynamic search integration)
- Excellent at web search integration — tends to have more recent info than ChatGPT
- More verbose responses
- Sometimes over-confident in citations
What we monitor:
- Recent news or announcements about your company
- Blog mentions and press coverage
- Product reviews and comparisons
- Market trends involving your category
Typical questions:
- “What’s new with [company]?”
- “What do people say about [product]?”
- “Is [company] growing?”
Perplexity AI
Models monitored: Perplexity Pro (Claude-based), Perplexity Free (experimental)
Market share: 5% but rapidly growing in research/analyst use cases Training data cutoff: Real-time web search Key quirks:
- Most recent training data of any model (performs real-time search)
- High accuracy for current events
- Excellent at providing sources
- Heavy reliance on search results can sometimes include misinformation from forums
What we monitor:
- Real-time brand mentions across web
- Recent product announcements
- Industry news involving your company
- Analyst coverage and reviews
Typical questions:
- “What’s [company] doing this week?”
- “Where is [company] mentioned in the news?”
- “What do industry analysts say about [company]?”
Microsoft Copilot
Models monitored: Enterprise (GPT-4 Turbo), Consumer (GPT-4o)
Market share: 25% within enterprises (Office integration), growing rapidly Training data cutoff: Varies by model, often newer than ChatGPT Key quirks:
- Integrated with Microsoft products (Word, Excel, Teams) — sees different context
- Enterprise version has access to company’s own data
- Sometimes references Microsoft partnerships more heavily
What we monitor:
- Enterprise software claims (integrations, compatibility)
- Microsoft partnership mentions
- B2B positioning
- Technical documentation references
Typical questions:
- “Does [product] integrate with Microsoft Teams?”
- “Is [company] a Microsoft partner?”
- “Can [product] run on Azure?”
Other Engines (Coming Soon)
We’re adding monitoring for:
- Amazon Q — AWS integration focus
- Hugging Face Chat — Open-source focused discussions
- Mistral Chat — European regulations, privacy-focused
- Poe — Multi-engine aggregator
- And more as new engines become mainstream
How We Monitor Each Engine
Query Process
- Formula-based queries: We ask 20-40 predefined questions about your brand for consistency
- Structured prompts: Each question is asked 2-3 times with slight variations to test reliability
- Fresh queries: Each monitoring cycle, we start from scratch (no chat history)
- Consistent user agent: All queries appear from the same “user” type to avoid variation
Scoring
Each response is evaluated against your truth nuggets:
Accuracy Score = (Correctly verified claims / Total claims in response) × 100Verification happens through:
- Exact string matching (e.g., “Founded in 2020” matches “Founded in 2020”)
- Semantic similarity (e.g., “Founded in early 2020” matches “Founded in 2020”)
- Range matching (e.g., “500-600 employees” matches “550 employees”)
- Date tolerance (e.g., “In 2020” matches “March 2020”)
Response Handling
What happens to each response:
- Claims extraction: NER (Named Entity Recognition) extracts factual claims
- Truth nugget matching: Each claim is compared to your verified facts
- Scoring: Accuracy is calculated per-engine, per-query
- Trend tracking: Week-over-week changes are monitored
- Alert generation: Critical inaccuracies trigger alerts
Engine-Specific Recommendations
If Your Score is High Across All Engines
You have excellent brand representation. Focus on maintaining currency:
- Update truth nuggets when things change
- Monitor narratives for emerging issues
- Quarterly deep dives into negative narratives
If Your Score is High in Some, Low in Others
Common patterns:
- ChatGPT low, Gemini/Perplexity high: Your website is recent/updated but ChatGPT’s training data is old. No immediate action needed — ChatGPT will catch up at next model update.
- Perplexity low, others high: Your information isn’t in search results yet. Usually means very recent changes. Add more links or press releases to get indexed.
- Copilot low, others high: Your info isn’t in Microsoft-accessible sources. Check if you’re listed in directories Microsoft crawls.
If Your Score is Low Across All Engines
This suggests either:
- Your truth nuggets aren’t well-represented on your website
- Your website content is outdated
- Competing information exists in AI training data
Action:
- Review your top 3 “missing information” alerts
- Check if those facts are clearly on your website
- Add them if missing, or rewrite existing content for clarity
- Wait 1-2 weeks for AI models to re-crawl
Monitoring Frequency
By default, each engine is monitored weekly. You can customize:
- Daily: Better for critical brands (health/safety impact) or during high-visibility events
- Weekly: Standard frequency, best for most companies
- Monthly: Cost-effective for non-critical brands
- Custom: Define your own schedule
Higher frequency = higher API costs, but faster detection of inaccuracies.
Engine Comparison
| Engine | Recency | Reliability | Popularity | Coverage |
|---|---|---|---|---|
| ChatGPT | Good | Excellent | Highest | Broad |
| Claude | Good | Excellent | High | Technical focus |
| Gemini | Very Good | Good | Medium | Broad |
| Perplexity | Excellent | Good | Growing | Current events |
| Copilot | Good | Good | Growing | Enterprise |
Next Steps
- Accuracy Score Deep Dive → — Understand how scores are calculated
- Brand Dashboard → — See per-engine scores in real-time
- Narrative Tracking → — Track emerging narratives across engines
- GEO Optimization → — Improve visibility across all engines