Skip to content

Monitoring AI Engines

Brand Intelligence monitors 9+ AI engines for brand accuracy. Each engine has different training data, model architectures, and response patterns — so they each tell a different story about your brand.

Supported Engines

ChatGPT (OpenAI)

Models monitored: 4, 4o, o1

Market share: 65% of enterprise AI usage Training data cutoff: April 2024 (GPT-4o), varies by model Key quirks:

  • Most likely to hallucinate recent information (training data is older)
  • Web browsing in ChatGPT sometimes includes outdated cached content
  • Very sensitive to framing — different prompts can produce very different answers

What we monitor:

  • Direct product/company questions
  • Market positioning claims
  • Leadership information
  • Pricing and features
  • Historical facts about your company

Typical questions:

  • “What does [company] do?”
  • “Who are [company]‘s competitors?”
  • “What is [product]‘s pricing?”
  • “When was [company] founded?”

Claude (Anthropic)

Models monitored: 3, 3.5 Sonnet, 3.5 Haiku, 4

Market share: 20% enterprise, 35% in developer tools Training data cutoff: April 2024 (Claude 3.5), varies by model Key quirks:

  • More conservative — refuses some queries other models answer
  • Better at citing sources in responses
  • Strong on technical accuracy (better than GPT for engineering claims)
  • Longer context window means more detailed responses

What we monitor:

  • Technical capability claims
  • Integration and API documentation references
  • Engineering talent/credentials
  • Compliance and security certifications

Typical questions:

  • “Is [company] built on strong engineering principles?”
  • “What are [product]‘s API capabilities?”
  • “Does [company] comply with SOC 2?”

Google Gemini

Models monitored: Gemini 1.5 Pro, Flash

Market share: 10% enterprise, integrated into Gmail/Workspace Training data cutoff: Late 2023 (varies by model) Key quirks:

  • Often returns very different results day-to-day (dynamic search integration)
  • Excellent at web search integration — tends to have more recent info than ChatGPT
  • More verbose responses
  • Sometimes over-confident in citations

What we monitor:

  • Recent news or announcements about your company
  • Blog mentions and press coverage
  • Product reviews and comparisons
  • Market trends involving your category

Typical questions:

  • “What’s new with [company]?”
  • “What do people say about [product]?”
  • “Is [company] growing?”

Perplexity AI

Models monitored: Perplexity Pro (Claude-based), Perplexity Free (experimental)

Market share: 5% but rapidly growing in research/analyst use cases Training data cutoff: Real-time web search Key quirks:

  • Most recent training data of any model (performs real-time search)
  • High accuracy for current events
  • Excellent at providing sources
  • Heavy reliance on search results can sometimes include misinformation from forums

What we monitor:

  • Real-time brand mentions across web
  • Recent product announcements
  • Industry news involving your company
  • Analyst coverage and reviews

Typical questions:

  • “What’s [company] doing this week?”
  • “Where is [company] mentioned in the news?”
  • “What do industry analysts say about [company]?”

Microsoft Copilot

Models monitored: Enterprise (GPT-4 Turbo), Consumer (GPT-4o)

Market share: 25% within enterprises (Office integration), growing rapidly Training data cutoff: Varies by model, often newer than ChatGPT Key quirks:

  • Integrated with Microsoft products (Word, Excel, Teams) — sees different context
  • Enterprise version has access to company’s own data
  • Sometimes references Microsoft partnerships more heavily

What we monitor:

  • Enterprise software claims (integrations, compatibility)
  • Microsoft partnership mentions
  • B2B positioning
  • Technical documentation references

Typical questions:

  • “Does [product] integrate with Microsoft Teams?”
  • “Is [company] a Microsoft partner?”
  • “Can [product] run on Azure?”

Other Engines (Coming Soon)

We’re adding monitoring for:

  • Amazon Q — AWS integration focus
  • Hugging Face Chat — Open-source focused discussions
  • Mistral Chat — European regulations, privacy-focused
  • Poe — Multi-engine aggregator
  • And more as new engines become mainstream

How We Monitor Each Engine

Query Process

  1. Formula-based queries: We ask 20-40 predefined questions about your brand for consistency
  2. Structured prompts: Each question is asked 2-3 times with slight variations to test reliability
  3. Fresh queries: Each monitoring cycle, we start from scratch (no chat history)
  4. Consistent user agent: All queries appear from the same “user” type to avoid variation

Scoring

Each response is evaluated against your truth nuggets:

Accuracy Score = (Correctly verified claims / Total claims in response) × 100

Verification happens through:

  • Exact string matching (e.g., “Founded in 2020” matches “Founded in 2020”)
  • Semantic similarity (e.g., “Founded in early 2020” matches “Founded in 2020”)
  • Range matching (e.g., “500-600 employees” matches “550 employees”)
  • Date tolerance (e.g., “In 2020” matches “March 2020”)

Response Handling

What happens to each response:

  1. Claims extraction: NER (Named Entity Recognition) extracts factual claims
  2. Truth nugget matching: Each claim is compared to your verified facts
  3. Scoring: Accuracy is calculated per-engine, per-query
  4. Trend tracking: Week-over-week changes are monitored
  5. Alert generation: Critical inaccuracies trigger alerts

Engine-Specific Recommendations

If Your Score is High Across All Engines

You have excellent brand representation. Focus on maintaining currency:

  • Update truth nuggets when things change
  • Monitor narratives for emerging issues
  • Quarterly deep dives into negative narratives

If Your Score is High in Some, Low in Others

Common patterns:

  • ChatGPT low, Gemini/Perplexity high: Your website is recent/updated but ChatGPT’s training data is old. No immediate action needed — ChatGPT will catch up at next model update.
  • Perplexity low, others high: Your information isn’t in search results yet. Usually means very recent changes. Add more links or press releases to get indexed.
  • Copilot low, others high: Your info isn’t in Microsoft-accessible sources. Check if you’re listed in directories Microsoft crawls.

If Your Score is Low Across All Engines

This suggests either:

  1. Your truth nuggets aren’t well-represented on your website
  2. Your website content is outdated
  3. Competing information exists in AI training data

Action:

  1. Review your top 3 “missing information” alerts
  2. Check if those facts are clearly on your website
  3. Add them if missing, or rewrite existing content for clarity
  4. Wait 1-2 weeks for AI models to re-crawl

Monitoring Frequency

By default, each engine is monitored weekly. You can customize:

  • Daily: Better for critical brands (health/safety impact) or during high-visibility events
  • Weekly: Standard frequency, best for most companies
  • Monthly: Cost-effective for non-critical brands
  • Custom: Define your own schedule

Higher frequency = higher API costs, but faster detection of inaccuracies.

Engine Comparison

EngineRecencyReliabilityPopularityCoverage
ChatGPTGoodExcellentHighestBroad
ClaudeGoodExcellentHighTechnical focus
GeminiVery GoodGoodMediumBroad
PerplexityExcellentGoodGrowingCurrent events
CopilotGoodGoodGrowingEnterprise

Next Steps