RAG Pipeline Integration
Retrieval-Augmented Generation (RAG) systems combine real-time information retrieval with LLM inference. TruthVouch Shield integrates seamlessly into RAG pipelines to verify retrieved documents and detect hallucinated or contaminated content before it reaches the user.
Why Verify RAG Outputs
RAG systems reduce hallucinations but introduce new risks:
- Retrieved documents may contain outdated, inaccurate, or biased information
- LLMs may misinterpret or distort retrieved facts
- Generated summaries may contradict source materials
- Cross-domain contamination can spread misinformation
Integrating Shield ensures that every RAG response is fact-checked against your knowledge base and external sources.
Architecture Patterns
Pattern 1: Verification After Retrieval
Verify documents immediately after retrieval, before they enter the LLM context:
from truthvouch.shield import VerificationClient
client = VerificationClient(api_key="your-api-key")
def retrieve_and_verify(query: str) -> list[dict]: # Retrieve documents from vector store documents = vector_db.search(query, top_k=5)
# Verify each document verified = [] for doc in documents: result = client.verify_fact( text=doc["content"], context=query, source_url=doc.get("source_url") ) if result.confidence > 0.7: # High confidence verified.append({ **doc, "verification": result })
return verifiedPattern 2: Verification After Generation
Verify the final LLM response against retrieved sources:
def rag_with_final_check(query: str) -> dict: # Retrieve and generate documents = retrieve_and_verify(query) response = llm.generate( query=query, context=documents )
# Verify generated response fact_checks = client.cross_check( query=response["text"], sources=[d["content"] for d in documents] )
return { "response": response["text"], "sources": documents, "fact_checks": fact_checks, "is_verified": all(fc["verified"] for fc in fact_checks) }Pattern 3: Streaming Verification
For real-time RAG with streaming responses:
from truthvouch.shield import StreamVerificationClient
stream_client = StreamVerificationClient(api_key="your-api-key")
async def stream_rag_response(query: str): documents = retrieve_and_verify(query)
# Stream LLM response with real-time verification async for chunk in llm.stream_generate(query=query, context=documents): # Verify chunk in parallel result = await stream_client.verify_streaming_fact( text=chunk, context_docs=[d["content"] for d in documents] )
yield { "chunk": chunk, "confidence": result.confidence, "flags": result.flags if result.confidence < 0.6 else [] }Language-Specific Examples
Python with LangChain
from langchain.callbacks import TruthVouchVerificationCallbackfrom langchain.chains import RetrievalQAfrom langchain.llms import OpenAI
# Create RAG chainqa = RetrievalQA.from_chain_type( llm=OpenAI(), chain_type="stuff", retriever=vector_store.as_retriever(), callbacks=[TruthVouchVerificationCallback( api_key="your-api-key", min_confidence=0.7 )])
# Responses are automatically fact-checkedresponse = qa.run("What is the capital of France?")Python with LlamaIndex
from llama_index.core import VectorStoreIndexfrom llama_index.core.callbacks import TruthVouchCallback
# Wrap index with verificationindex = VectorStoreIndex.from_documents(documents)index.add_callback(TruthVouchCallback(api_key="your-api-key"))
# Queries include verification resultsquery_engine = index.as_query_engine( callback_manager=index.callback_manager)
response = query_engine.query("What is the capital of France?")TypeScript with LangChain.js
import { TruthVouchVerificationHandler } from "@truthvouch/langchain";import { RetrievalQAChain } from "langchain/chains";import { OpenAI } from "langchain/llms/openai";
const chain = RetrievalQAChain.fromLLM( new OpenAI(), retriever, { callbacks: [new TruthVouchVerificationHandler({ apiKey: "your-api-key", minConfidence: 0.7 })] });
const result = await chain.call({ query: "What is the capital of France?"});Best Practices
Source Management
- Include
source_urlandsource_metadatawith every retrieved document - Store document hash for integrity verification
- Track retrieval timestamps for freshness checks
Confidence Thresholds
- Use 0.9+ for critical information (healthcare, legal, financial)
- Use 0.7-0.8 for general use cases
- Flag uncertain facts with confidence 0.5-0.7 for human review
Caching
- Cache verification results for identical queries (24-hour TTL)
- Cache document-level scores for frequently retrieved documents
- Use Redis or similar for distributed caching
Monitoring
- Log all verification results with query ID for audit trail
- Alert on sudden drops in confidence scores
- Track retriever precision/recall vs verification results
Rate Limits and Costs
Each verification request counts toward your API quota:
- Cross-check: 1 credit per external source (max 10 sources)
- Streaming verification: 1 credit per 1,000 tokens
- Batch verification: 0.1 credit per document (min 100 docs)
Estimate costs for large RAG deployments and use batch APIs where possible.
Troubleshooting
Q: Verification is slow with large documents
- Use document chunking (512-1024 tokens per chunk)
- Verify only the top-5 retrieved documents
- Cache results for similar queries
Q: False positives flagging correct information
- Increase confidence threshold by 0.1
- Add domain-specific context to verification request
- Use custom verification rules for your industry
Q: Integration failing with streaming responses
- Ensure SDK version >= 2.0 supports streaming
- Add error handling for timeout (30s default)
- Use fallback verification on stream completion
Next Steps
- Review Chatbot Fact-Checking for real-time use cases
- Explore Verification API for advanced options
- Check SDK Quickstarts for your language