Auto-Discovery
Auto-Discovery automatically finds facts about your organization from your website, public sources, and news. Instead of manually entering each fact, TruthVouch scans your digital presence and suggests verified facts for approval.
How Auto-Discovery Works
The discovery process runs in five stages:
- Web Crawling — Scan your website (homepage, about page, press room, blog, etc.)
- Named Entity Recognition (NER) — Extract entities: company names, people, dates, numbers, locations, products
- Fact Scoring — Evaluate confidence in each extracted fact (is it a verified claim or just mentioned in passing?)
- Deduplication — Group similar facts together
- Human Review — You approve, edit, or reject each suggestion
The process typically takes 10-30 minutes depending on site size, and surfaces 3-5x more facts than manual entry.
Setting Up Auto-Discovery
Initial Setup (During Onboarding)
The onboarding wizard includes auto-discovery:
- Enter your website domain
- Select pages to scan (or scan all)
- Wait for analysis
- Review suggestions
- Approve/edit/reject
Run Discovery Anytime
From Knowledge Base → Truth Nuggets, click Run Auto-Discovery:
- Enter domain — defaults to your registered company domain
- Select pages — choose which sections to scan:
- Homepage
- About / Company info
- Products / Services
- Press / News
- Blog
- Custom URLs
- Set confidence threshold — only suggest facts with >75% confidence (adjustable)
- Start scan
The scan typically completes within 15 minutes.
Reviewing Suggestions
Once the scan completes, you see a list of suggested nuggets:
| Fact | Source | Confidence | Action |
|---|---|---|---|
| ”Founded in 2024” | /about → “founded in 2024” | 98% | Approve |
| ”CEO: Sarah Chen” | /team → “led by Sarah Chen” | 92% | Edit/Approve |
| ”1000+ customers” | /homepage | 75% | Reject (outdated) |
Approve
Accept the suggestion as-is. It becomes an active nugget immediately.
Edit
Modify the text, category, or confidence before approving. Use this when:
- The extracted text needs refinement
- You want to adjust confidence (e.g., marketing claim that’s an estimate)
- You want to set an expiry date
- You want to add a specific source URL
Reject
Skip the suggestion. Useful for:
- Outdated information
- Marketing hyperbole you don’t want in ground truth
- Competitive information that shouldn’t be stored
- Temporary claims
Request Verification
For facts you’re unsure about, mark for manual verification. Team members can investigate and approve/reject.
Example: Discovering Product Facts
Your homepage says: “TruthVouch Shield detects hallucinations in 9+ AI engines with 94% accuracy using NLI.”
Auto-discovery extracts:
- Fact: “TruthVouch Shield monitors 9+ AI engines”
- Fact: “Detection accuracy is 94%”
- Fact: “Uses NLI-based detection”
You can:
- Approve all → All become active nuggets
- Edit first two → Add source URL to /benchmarks page, lower confidence to 0.92 (not marketing)
- Reject third → Too technical/implementation-specific
Discovery Accuracy
Auto-discovery is 85-92% accurate depending on content clarity:
| Content Type | Accuracy | Examples |
|---|---|---|
| Structured data | 95%+ | Schema.org markup, tables, lists |
| Clear statements | 90-95% | “Founded in 2024”, “1000+ customers” |
| Implicit facts | 75-85% | “Led by Sarah Chen” from name/title |
| Contextual | 60-75% | Market position, subtle claims |
| Speculation | <60% | “Could be”, “aims to”, “in development” |
Confidence scores reflect these patterns. Facts from highly structured content get higher scores.
Handling Low-Confidence Suggestions
Discovery shows all suggestions, even low-confidence ones:
- 75-85%: Review carefully, adjust confidence if approved
- 60-75%: Probably reject unless you explicitly approve the claim
- <60%: Usually noise, easy to skip
Recurring Discovery
Set auto-discovery to run on a schedule:
- Weekly: Catch new blog posts, product announcements
- Monthly: Quarterly updates to financials, headcount, partnerships
- Ad-hoc: When you launch a new product or reset your brand
From Knowledge Base → Settings → Auto-Discovery Schedule, choose frequency. Suggestions are batched and sent as a weekly digest if recurring.
Competitive Discovery (Professional+ Tier)
Optionally, discover facts about competitors:
- Add competitor domains
- Run the same discovery process
- Review their claimed facts
- Monitor them for divergence from your truth
(See Competitive Intelligence for details)
Privacy & Data
Auto-discovery:
- Scans only public URLs
- Does NOT submit your facts to external databases
- Stores suggested facts in your secure Knowledge Base
- Complies with robots.txt and site crawl policies
No third party sees your private information.
Troubleshooting
Scan takes too long
If a scan hangs, check:
- Website is publicly accessible (not requiring login)
- No unusually large files blocking crawl
- Try scanning specific pages instead of all
Low-quality suggestions
If suggestions are mostly noise:
- Increase confidence threshold to 80%+
- Select specific pages with structured data
- Edit your website to add Schema.org markup (improves discovery)
Missing facts
If discovery misses important facts:
- Create them manually (takes 2 minutes per nugget)
- Ensure facts appear on public pages with clear language
- Consider adding to a prominent location like About page
Best Practices
- Run after major updates — After launching a product, updating pricing, or changing leadership
- Review and curate — Don’t approve everything; enforce quality standards
- Set source URLs — Auto-discovery suggests URLs; keep them for auditability
- Combine with manual — Use discovery for bulk facts, manually add niche details
- Schedule recurring — Let the system keep you updated as your site evolves