Skip to content

RAG Pipeline Integration

Retrieval-Augmented Generation (RAG) systems combine real-time information retrieval with LLM inference. TruthVouch Shield integrates seamlessly into RAG pipelines to verify retrieved documents and detect hallucinated or contaminated content before it reaches the user.

Why Verify RAG Outputs

RAG systems reduce hallucinations but introduce new risks:

  • Retrieved documents may contain outdated, inaccurate, or biased information
  • LLMs may misinterpret or distort retrieved facts
  • Generated summaries may contradict source materials
  • Cross-domain contamination can spread misinformation

Integrating Shield ensures that every RAG response is fact-checked against your knowledge base and external sources.

Architecture Patterns

Pattern 1: Verification After Retrieval

Verify documents immediately after retrieval, before they enter the LLM context:

from truthvouch.shield import VerificationClient
client = VerificationClient(api_key="your-api-key")
def retrieve_and_verify(query: str) -> list[dict]:
# Retrieve documents from vector store
documents = vector_db.search(query, top_k=5)
# Verify each document
verified = []
for doc in documents:
result = client.verify_fact(
text=doc["content"],
context=query,
source_url=doc.get("source_url")
)
if result.confidence > 0.7: # High confidence
verified.append({
**doc,
"verification": result
})
return verified

Pattern 2: Verification After Generation

Verify the final LLM response against retrieved sources:

def rag_with_final_check(query: str) -> dict:
# Retrieve and generate
documents = retrieve_and_verify(query)
response = llm.generate(
query=query,
context=documents
)
# Verify generated response
fact_checks = client.cross_check(
query=response["text"],
sources=[d["content"] for d in documents]
)
return {
"response": response["text"],
"sources": documents,
"fact_checks": fact_checks,
"is_verified": all(fc["verified"] for fc in fact_checks)
}

Pattern 3: Streaming Verification

For real-time RAG with streaming responses:

from truthvouch.shield import StreamVerificationClient
stream_client = StreamVerificationClient(api_key="your-api-key")
async def stream_rag_response(query: str):
documents = retrieve_and_verify(query)
# Stream LLM response with real-time verification
async for chunk in llm.stream_generate(query=query, context=documents):
# Verify chunk in parallel
result = await stream_client.verify_streaming_fact(
text=chunk,
context_docs=[d["content"] for d in documents]
)
yield {
"chunk": chunk,
"confidence": result.confidence,
"flags": result.flags if result.confidence < 0.6 else []
}

Language-Specific Examples

Python with LangChain

from langchain.callbacks import TruthVouchVerificationCallback
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
# Create RAG chain
qa = RetrievalQA.from_chain_type(
llm=OpenAI(),
chain_type="stuff",
retriever=vector_store.as_retriever(),
callbacks=[TruthVouchVerificationCallback(
api_key="your-api-key",
min_confidence=0.7
)]
)
# Responses are automatically fact-checked
response = qa.run("What is the capital of France?")

Python with LlamaIndex

from llama_index.core import VectorStoreIndex
from llama_index.core.callbacks import TruthVouchCallback
# Wrap index with verification
index = VectorStoreIndex.from_documents(documents)
index.add_callback(TruthVouchCallback(api_key="your-api-key"))
# Queries include verification results
query_engine = index.as_query_engine(
callback_manager=index.callback_manager
)
response = query_engine.query("What is the capital of France?")

TypeScript with LangChain.js

import { TruthVouchVerificationHandler } from "@truthvouch/langchain";
import { RetrievalQAChain } from "langchain/chains";
import { OpenAI } from "langchain/llms/openai";
const chain = RetrievalQAChain.fromLLM(
new OpenAI(),
retriever,
{
callbacks: [new TruthVouchVerificationHandler({
apiKey: "your-api-key",
minConfidence: 0.7
})]
}
);
const result = await chain.call({
query: "What is the capital of France?"
});

Best Practices

Source Management

  • Include source_url and source_metadata with every retrieved document
  • Store document hash for integrity verification
  • Track retrieval timestamps for freshness checks

Confidence Thresholds

  • Use 0.9+ for critical information (healthcare, legal, financial)
  • Use 0.7-0.8 for general use cases
  • Flag uncertain facts with confidence 0.5-0.7 for human review

Caching

  • Cache verification results for identical queries (24-hour TTL)
  • Cache document-level scores for frequently retrieved documents
  • Use Redis or similar for distributed caching

Monitoring

  • Log all verification results with query ID for audit trail
  • Alert on sudden drops in confidence scores
  • Track retriever precision/recall vs verification results

Rate Limits and Costs

Each verification request counts toward your API quota:

  • Cross-check: 1 credit per external source (max 10 sources)
  • Streaming verification: 1 credit per 1,000 tokens
  • Batch verification: 0.1 credit per document (min 100 docs)

Estimate costs for large RAG deployments and use batch APIs where possible.

Troubleshooting

Q: Verification is slow with large documents

  • Use document chunking (512-1024 tokens per chunk)
  • Verify only the top-5 retrieved documents
  • Cache results for similar queries

Q: False positives flagging correct information

  • Increase confidence threshold by 0.1
  • Add domain-specific context to verification request
  • Use custom verification rules for your industry

Q: Integration failing with streaming responses

  • Ensure SDK version >= 2.0 supports streaming
  • Add error handling for timeout (30s default)
  • Use fallback verification on stream completion

Next Steps