Skip to content

Auto-Discovery

Auto-Discovery automatically finds facts about your organization from your website, public sources, and news. Instead of manually entering each fact, TruthVouch scans your digital presence and suggests verified facts for approval.

How Auto-Discovery Works

The discovery process runs in five stages:

  1. Web Crawling — Scan your website (homepage, about page, press room, blog, etc.)
  2. Named Entity Recognition (NER) — Extract entities: company names, people, dates, numbers, locations, products
  3. Fact Scoring — Evaluate confidence in each extracted fact (is it a verified claim or just mentioned in passing?)
  4. Deduplication — Group similar facts together
  5. Human Review — You approve, edit, or reject each suggestion

The process typically takes 10-30 minutes depending on site size, and surfaces 3-5x more facts than manual entry.

Setting Up Auto-Discovery

Initial Setup (During Onboarding)

The onboarding wizard includes auto-discovery:

  1. Enter your website domain
  2. Select pages to scan (or scan all)
  3. Wait for analysis
  4. Review suggestions
  5. Approve/edit/reject

Run Discovery Anytime

From Knowledge Base → Truth Nuggets, click Run Auto-Discovery:

  1. Enter domain — defaults to your registered company domain
  2. Select pages — choose which sections to scan:
    • Homepage
    • About / Company info
    • Products / Services
    • Press / News
    • Blog
    • Custom URLs
  3. Set confidence threshold — only suggest facts with >75% confidence (adjustable)
  4. Start scan

The scan typically completes within 15 minutes.

Reviewing Suggestions

Once the scan completes, you see a list of suggested nuggets:

FactSourceConfidenceAction
”Founded in 2024”/about → “founded in 2024”98%Approve
”CEO: Sarah Chen”/team → “led by Sarah Chen”92%Edit/Approve
”1000+ customers”/homepage75%Reject (outdated)

Approve

Accept the suggestion as-is. It becomes an active nugget immediately.

Edit

Modify the text, category, or confidence before approving. Use this when:

  • The extracted text needs refinement
  • You want to adjust confidence (e.g., marketing claim that’s an estimate)
  • You want to set an expiry date
  • You want to add a specific source URL

Reject

Skip the suggestion. Useful for:

  • Outdated information
  • Marketing hyperbole you don’t want in ground truth
  • Competitive information that shouldn’t be stored
  • Temporary claims

Request Verification

For facts you’re unsure about, mark for manual verification. Team members can investigate and approve/reject.

Example: Discovering Product Facts

Your homepage says: “TruthVouch Shield detects hallucinations in 9+ AI engines with 94% accuracy using NLI.”

Auto-discovery extracts:

  • Fact: “TruthVouch Shield monitors 9+ AI engines”
  • Fact: “Detection accuracy is 94%”
  • Fact: “Uses NLI-based detection”

You can:

  1. Approve all → All become active nuggets
  2. Edit first two → Add source URL to /benchmarks page, lower confidence to 0.92 (not marketing)
  3. Reject third → Too technical/implementation-specific

Discovery Accuracy

Auto-discovery is 85-92% accurate depending on content clarity:

Content TypeAccuracyExamples
Structured data95%+Schema.org markup, tables, lists
Clear statements90-95%“Founded in 2024”, “1000+ customers”
Implicit facts75-85%“Led by Sarah Chen” from name/title
Contextual60-75%Market position, subtle claims
Speculation<60%“Could be”, “aims to”, “in development”

Confidence scores reflect these patterns. Facts from highly structured content get higher scores.

Handling Low-Confidence Suggestions

Discovery shows all suggestions, even low-confidence ones:

  • 75-85%: Review carefully, adjust confidence if approved
  • 60-75%: Probably reject unless you explicitly approve the claim
  • <60%: Usually noise, easy to skip

Recurring Discovery

Set auto-discovery to run on a schedule:

  • Weekly: Catch new blog posts, product announcements
  • Monthly: Quarterly updates to financials, headcount, partnerships
  • Ad-hoc: When you launch a new product or reset your brand

From Knowledge Base → Settings → Auto-Discovery Schedule, choose frequency. Suggestions are batched and sent as a weekly digest if recurring.

Competitive Discovery (Professional+ Tier)

Optionally, discover facts about competitors:

  1. Add competitor domains
  2. Run the same discovery process
  3. Review their claimed facts
  4. Monitor them for divergence from your truth

(See Competitive Intelligence for details)

Privacy & Data

Auto-discovery:

  • Scans only public URLs
  • Does NOT submit your facts to external databases
  • Stores suggested facts in your secure Knowledge Base
  • Complies with robots.txt and site crawl policies

No third party sees your private information.

Troubleshooting

Scan takes too long

If a scan hangs, check:

  • Website is publicly accessible (not requiring login)
  • No unusually large files blocking crawl
  • Try scanning specific pages instead of all

Low-quality suggestions

If suggestions are mostly noise:

  • Increase confidence threshold to 80%+
  • Select specific pages with structured data
  • Edit your website to add Schema.org markup (improves discovery)

Missing facts

If discovery misses important facts:

  • Create them manually (takes 2 minutes per nugget)
  • Ensure facts appear on public pages with clear language
  • Consider adding to a prominent location like About page

Best Practices

  1. Run after major updates — After launching a product, updating pricing, or changing leadership
  2. Review and curate — Don’t approve everything; enforce quality standards
  3. Set source URLs — Auto-discovery suggests URLs; keep them for auditability
  4. Combine with manual — Use discovery for bulk facts, manually add niche details
  5. Schedule recurring — Let the system keep you updated as your site evolves

Next Steps