RAG Pipeline Integration

Retrieval-Augmented Generation (RAG) systems combine real-time information retrieval with LLM inference. TruthVouch Shield integrates seamlessly into RAG pipelines to verify retrieved documents and detect hallucinated or contaminated content before it reaches the user.

Why Verify RAG Outputs

RAG systems reduce hallucinations but introduce new risks:

Retrieved documents may contain outdated, inaccurate, or biased information
LLMs may misinterpret or distort retrieved facts
Generated summaries may contradict source materials
Cross-domain contamination can spread misinformation

Integrating Shield ensures that every RAG response is fact-checked against your knowledge base and external sources.

Architecture Patterns

Pattern 1: Verification After Retrieval

Verify documents immediately after retrieval, before they enter the LLM context:

from truthvouch.shield import VerificationClient

client = VerificationClient(api_key="your-api-key")

def retrieve_and_verify(query: str) -> list[dict]:
    # Retrieve documents from vector store
    documents = vector_db.search(query, top_k=5)

    # Verify each document
    verified = []
    for doc in documents:
        result = client.verify_fact(
            text=doc["content"],
            context=query,
            source_url=doc.get("source_url")
        )
        if result.confidence > 0.7:  # High confidence
            verified.append({
                **doc,
                "verification": result
            })

    return verified

Pattern 2: Verification After Generation

Verify the final LLM response against retrieved sources:

def rag_with_final_check(query: str) -> dict:
    # Retrieve and generate
    documents = retrieve_and_verify(query)
    response = llm.generate(
        query=query,
        context=documents
    )

    # Verify generated response
    fact_checks = client.cross_check(
        query=response["text"],
        sources=[d["content"] for d in documents]
    )

    return {
        "response": response["text"],
        "sources": documents,
        "fact_checks": fact_checks,
        "is_verified": all(fc["verified"] for fc in fact_checks)
    }

Pattern 3: Streaming Verification

For real-time RAG with streaming responses:

from truthvouch.shield import StreamVerificationClient

stream_client = StreamVerificationClient(api_key="your-api-key")

async def stream_rag_response(query: str):
    documents = retrieve_and_verify(query)

    # Stream LLM response with real-time verification
    async for chunk in llm.stream_generate(query=query, context=documents):
        # Verify chunk in parallel
        result = await stream_client.verify_streaming_fact(
            text=chunk,
            context_docs=[d["content"] for d in documents]
        )

        yield {
            "chunk": chunk,
            "confidence": result.confidence,
            "flags": result.flags if result.confidence < 0.6 else []
        }

Language-Specific Examples

Python with LangChain

from langchain.callbacks import TruthVouchVerificationCallback
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Create RAG chain
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    chain_type="stuff",
    retriever=vector_store.as_retriever(),
    callbacks=[TruthVouchVerificationCallback(
        api_key="your-api-key",
        min_confidence=0.7
    )]
)

# Responses are automatically fact-checked
response = qa.run("What is the capital of France?")

Python with LlamaIndex

from llama_index.core import VectorStoreIndex
from llama_index.core.callbacks import TruthVouchCallback

# Wrap index with verification
index = VectorStoreIndex.from_documents(documents)
index.add_callback(TruthVouchCallback(api_key="your-api-key"))

# Queries include verification results
query_engine = index.as_query_engine(
    callback_manager=index.callback_manager
)

response = query_engine.query("What is the capital of France?")

TypeScript with LangChain.js

import { TruthVouchVerificationHandler } from "@truthvouch/langchain";
import { RetrievalQAChain } from "langchain/chains";
import { OpenAI } from "langchain/llms/openai";

const chain = RetrievalQAChain.fromLLM(
  new OpenAI(),
  retriever,
  {
    callbacks: [new TruthVouchVerificationHandler({
      apiKey: "your-api-key",
      minConfidence: 0.7
    })]
  }
);

const result = await chain.call({
  query: "What is the capital of France?"
});

Best Practices

Source Management

Include source_url and source_metadata with every retrieved document
Store document hash for integrity verification
Track retrieval timestamps for freshness checks

Confidence Thresholds

Use 0.9+ for critical information (healthcare, legal, financial)
Use 0.7-0.8 for general use cases
Flag uncertain facts with confidence 0.5-0.7 for human review

Caching

Cache verification results for identical queries (24-hour TTL)
Cache document-level scores for frequently retrieved documents
Use Redis or similar for distributed caching

Monitoring

Log all verification results with query ID for audit trail
Alert on sudden drops in confidence scores
Track retriever precision/recall vs verification results

Rate Limits and Costs

Each verification request counts toward your API quota:

Cross-check: 1 credit per external source (max 10 sources)
Streaming verification: 1 credit per 1,000 tokens
Batch verification: 0.1 credit per document (min 100 docs)

Estimate costs for large RAG deployments and use batch APIs where possible.

Troubleshooting

Q: Verification is slow with large documents

Use document chunking (512-1024 tokens per chunk)
Verify only the top-5 retrieved documents
Cache results for similar queries

Q: False positives flagging correct information

Increase confidence threshold by 0.1
Add domain-specific context to verification request
Use custom verification rules for your industry

Q: Integration failing with streaming responses

Ensure SDK version >= 2.0 supports streaming
Add error handling for timeout (30s default)
Use fallback verification on stream completion

Next Steps

Review Chatbot Fact-Checking for real-time use cases
Explore Verification API for advanced options
Check SDK Quickstarts for your language