Configuration
Configure TruthOps settings to customize monitoring, set performance thresholds, define escalation policies, and manage agent policies across your organization.
Configuration Areas
1. Monitoring Settings
Performance Metrics to Track:
For each agent, define what to monitor:
- Accuracy: % of decisions meeting quality threshold
- Latency: Response time (p50, p95, p99)
- Availability: Uptime % (vs. SLA)
- Cost: Monthly LLM API spend
- Throughput: Decisions per minute/hour
- Error Rate: % of decisions causing issues
- User Satisfaction: CSAT or NPS (if available)
Configuration:
Go to Settings → Monitoring → Select MetricsChoose metrics to display on dashboard per agentSet refresh frequency (real-time, 1-min, 5-min, 1-hour)Example configuration for Customer Service Agent:
- Accuracy: Track (target: ≥85%)
- Latency: Track p95 (target: <2 sec)
- Availability: Track (target: ≥99%)
- Cost: Track (budget: $5K/month)
- Throughput: Track (expected: 100-200 queries/min)
- Error rate: Track (target: <2%)
2. Alert Thresholds
Define when to alert if metrics fall below targets:
For each metric, set:
- Warning threshold (yellow alert; 80% of target)
- Critical threshold (red alert; below target)
- Alert window (how long above threshold before alerting? e.g., 5-min avg)
Example thresholds for Customer Service Agent:
| Metric | Target | Warning | Critical | Window |
|---|---|---|---|---|
| Accuracy | ≥85% | <87% | <85% | 1-hour average |
| Latency (p95) | <2 sec | >1.8 sec | >2 sec | 5-minute average |
| Availability | ≥99% | <99.1% | <99% | 1-hour average |
| Cost | $5K/month | $5.5K | $6K | 24-hour forecast |
Configuration:
Go to Settings → Alerts → Select AgentSet threshold values and windowsChoose notification channels (email, Slack, PagerDuty)Test alert to verify setup3. Escalation Policies
Define how to escalate issues when thresholds breached:
Escalation chain (example):
Level 1: Warning alert (e.g., accuracy dropping) → Notify agent owner via email/Slack → Wait 30 minutes for response
Level 2: Critical alert (e.g., accuracy < 85%) → Page on-call engineer (PagerDuty) → Wait 15 minutes for acknowledgment
Level 3: Incident escalation (e.g., accuracy < 80% for 1 hour) → Escalate to VP Engineering → Consider disabling agent or reducing autonomy → Manual intervention requiredConfiguration:
Go to Settings → Escalation PoliciesDefine levels and notification channelsSet time windows per levelAssign escalation contacts (email, Slack handle, phone)4. Agent Policies
Define rules for how agents should behave:
Policy Examples:
Policy: Autonomy Gates
Customer Service Agent autonomy rules: IF confidence > 90% THEN handle query independently IF confidence 70-90% THEN send to supervisor for approval IF confidence < 70% THEN escalate to expert
Data Classifier policy: IF confidence > 95% THEN auto-categorize IF confidence 80-95% THEN auto-categorize + audit (5% sample) IF confidence < 80% THEN route to humanPolicy: Rate Limiting
Customer Service Agent: Max 2,000 queries/hour (prevent API quota issues) If exceeded: Queue overflow; respond with "Please wait..."
Resume Screener: Max 100 evaluations/day (cost control) If exceeded: New evaluations queued; process next dayPolicy: Data Retention
Customer Service Agent: Retain conversation logs 30 days Delete after 30 days (compliance)
Hiring Screener: Retain candidate scores 1 year (legal hold) Delete after 1 yearPolicy: Fallback Behavior
Customer Service Agent: If LLM API unavailable: Respond with "Please try again later" If knowledge base unavailable: Escalate to human
Data Classifier: If processing fails: Default to "Uncategorized" If accuracy drops: Fall back to previous model versionConfiguration:
Go to Settings → Agent Policies → Select AgentAdd/edit policies (conditional rules)Set enforcement (how strictly enforced?)Test policy behavior5. Compliance Settings
Audit & Logging:
All decisions logged? YES (regulatory requirement)Logs retained for: 7 years (SOX requirement)Log access restricted to: Compliance team onlyLogs encrypted? YES (at rest and in transit)Data Privacy:
Agent accesses PII? YESPII types: Customer names, emails, phone numbersCompliance frameworks: GDPR, CCPAData residency: US only (customer requirement)Anonymization: Remove PII after 30 daysBias & Fairness:
Agent makes decisions affecting protected classes? YES (hiring)Fairness metrics tracked? YES (disparate impact by gender/race)Fairness audit frequency? MonthlyThreshold for action: >5% disparate impactConfiguration:
Go to Settings → ComplianceSet audit and logging requirementsDefine privacy controls (data retention, anonymization)Set fairness thresholds and audit frequency6. Integration Settings
Connect External Systems:
- LLM APIs: OpenAI, Anthropic, Google, Azure
- Data Warehouses: Snowflake, BigQuery, Redshift
- Communication: Email, Slack, Teams, PagerDuty
- Knowledge Bases: Vector DBs, RAG systems
- Monitoring: Datadog, New Relic, CloudWatch
Configuration:
Go to Settings → IntegrationsAuthenticate and connect each serviceSet sync frequency (real-time, hourly, daily)Test connectionExample for Customer Service Agent:
- LLM: OpenAI GPT-4 (API key in vault)
- Knowledge Base: Pinecone (vector DB for product FAQ)
- Communication: Slack (notifications to #customer-support)
- Monitoring: Datadog (performance metrics)
7. Cost Management
Set spending limits and alerts:
Agent cost budget: Customer Service Agent: $5,000/month limit Alert at 80% ($4,000) Alert at 100% ($5,000)
Global budget: All agents: $50,000/month total Alert at 90% ($45,000)Optimization rules:
Rate limiting: Max 200 queries/min per agent (prevent API overages)Batch processing: Use cheaper batch APIs when latency allowsModel switching: Use GPT-3.5 for simple queries (save 80% vs. GPT-4)Fallback: Use cached responses when possibleConfiguration:
Go to Settings → Cost ManagementSet per-agent budgetsSet global budgetDefine optimization rulesEnable/disable cost-saving strategiesBest Practices
1. Start Conservative
When configuring new agent:
- Set autonomy level lower than needed (safer)
- Monitor performance for 2-4 weeks
- Gradually increase autonomy as confidence grows
2. Alert Fatigue
Too many alerts = ignored alerts. Balance:
- Warning threshold: 10-20% above target (prevents noise)
- Critical threshold: Right at target (only urgent issues)
- Window: Average over time (ignore temporary spikes)
3. Escalation SLA
Set realistic escalation windows:
- Critical (page on-call): <15 min response
- High (email owner): <1 hour response
- Medium (batch review): <24 hours
4. Regular Reviews
Review configurations quarterly:
- Are thresholds still appropriate? (update targets)
- Are escalation policies working? (adjust if needed)
- Are alerts actionable? (reduce noise; increase signal)
- Are cost budgets realistic? (adjust for growth)
Related Topics
- Agent Inventory — Register and classify agents
- Autonomy Levels — Define autonomy and controls
- Monitoring — Real-time agent health dashboard
Next Steps
- List all agents — What agents do you need to configure?
- Define metrics — What KPIs matter for each agent?
- Set thresholds — What’s acceptable vs. concerning?
- Create policies — What rules should agents follow?
- Configure integrations — Connect to your systems
- Set budgets — Define cost limits
- Test configuration — Verify alerts and escalations work