Integrate Brokle with LlamaIndex for comprehensive tracing of RAG applications, query engines, and indexes

LlamaIndex Integration

Name: Brokle
Author: Brokle

Integrate Brokle with LlamaIndex to trace query engines, indexes, retrievers, and all LLM interactions within your RAG applications.

Supported Features

Feature	Supported	Notes
Query Engines	✅	All query engine types
Indexes	✅	Vector, list, tree, keyword
Retrievers	✅	With relevance scores
LLM Calls	✅	All LLM providers
Embeddings	✅	Embedding generation
Node Postprocessing	✅	Reranking, filtering
Agents	✅	ReAct, OpenAI agents

Quick Start

Install Dependencies

pip install brokle llama-index llama-index-llms-openai

Set Up Callback Handler

from brokle import Brokle
from brokle.integrations.llamaindex import BrokleCallbackHandler
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

# Initialize Brokle
brokle = Brokle(api_key="bk_...")

# Create callback handler
handler = BrokleCallbackHandler()

# Set as global callback
Settings.callback_manager.add_handler(handler)

Build and Query Index

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader("data").load_data()

# Build index (traced)
index = VectorStoreIndex.from_documents(documents)

# Query (traced)
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")

print(response)
brokle.flush()

Integration Methods

Method 1: Global Callback (Recommended)

Set callbacks globally for all LlamaIndex operations:

from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager

# Create callback manager with Brokle handler
callback_manager = CallbackManager([handler])
Settings.callback_manager = callback_manager

# All operations are now traced
index = VectorStoreIndex.from_documents(documents)
response = index.as_query_engine().query("question")

Method 2: Per-Index Callback

Set callbacks for specific indexes:

from llama_index.core.callbacks import CallbackManager

callback_manager = CallbackManager([handler])

index = VectorStoreIndex.from_documents(
    documents,
    callback_manager=callback_manager
)

Method 3: Context Manager

Use within a specific context:

from brokle.integrations.llamaindex import BrokleTracer

with BrokleTracer(name="rag_query") as tracer:
    response = query_engine.query("What is AI?")

Tracing Query Engines

Basic Query Engine

from llama_index.core import VectorStoreIndex

# Build index
index = VectorStoreIndex.from_documents(documents)

# Create query engine
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"
)

# Query is fully traced
response = query_engine.query("Explain the key concepts")

Trace structure:

query
├── retriever
│   ├── embedding (embed query)
│   └── vector_search
├── node_postprocessor (if any)
├── response_synthesizer
│   └── llm_call (generation)
└── response

Chat Engine

from llama_index.core.chat_engine import SimpleChatEngine

chat_engine = index.as_chat_engine(
    chat_mode="condense_question",
    verbose=True
)

# Multi-turn conversation traced
response = chat_engine.chat("Hello, what can you help me with?")
response = chat_engine.chat("Tell me more about that")

Custom Query Engine

from llama_index.core.query_engine import CustomQueryEngine

class MyQueryEngine(CustomQueryEngine):
    def custom_query(self, query_str: str):
        # Custom logic - all LLM calls are traced
        nodes = self.retriever.retrieve(query_str)
        response = self.llm.complete(f"Answer: {query_str}\nContext: {nodes}")
        return response

query_engine = MyQueryEngine(retriever=retriever, llm=llm)
response = query_engine.query("My question")

Tracing Indexes

Index Building

Index construction is traced:

from llama_index.core import VectorStoreIndex

# Document loading traced
documents = SimpleDirectoryReader("data").load_data()

# Index building traced (including embeddings)
index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True
)

Build traces include:

Document parsing
Chunking/node creation
Embedding generation
Vector store insertion

Different Index Types

from llama_index.core import (
    VectorStoreIndex,
    ListIndex,
    TreeIndex,
    KeywordTableIndex
)

# Vector index (with embeddings)
vector_index = VectorStoreIndex.from_documents(docs)

# List index (sequential)
list_index = ListIndex.from_documents(docs)

# Tree index (hierarchical)
tree_index = TreeIndex.from_documents(docs)

# Keyword index (text-based)
keyword_index = KeywordTableIndex.from_documents(docs)

Tracing Retrievers

Vector Retriever

from llama_index.core.retrievers import VectorIndexRetriever

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10
)

# Retrieval traced with scores
nodes = retriever.retrieve("What is the topic?")

for node in nodes:
    print(f"Score: {node.score}, Text: {node.text[:100]}")

Retrieval traces capture:

Query embedding
Vector similarity search
Retrieved nodes with scores
Retrieval latency

Hybrid Retriever

from llama_index.core.retrievers import BM25Retriever
from llama_index.core.retrievers.fusion_retriever import QueryFusionRetriever

# Combine vector and keyword search
vector_retriever = index.as_retriever(similarity_top_k=5)
bm25_retriever = BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=5)

fusion_retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    similarity_top_k=10,
    num_queries=4
)

# All retrievers traced
nodes = fusion_retriever.retrieve("complex query")

Node Processing

Reranking

from llama_index.core.postprocessor import SentenceTransformerRerank

reranker = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    top_n=3
)

query_engine = index.as_query_engine(
    node_postprocessors=[reranker]
)

# Reranking traced
response = query_engine.query("What is important?")

Filtering

from llama_index.core.postprocessor import MetadataReplacementPostProcessor

processor = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

query_engine = index.as_query_engine(
    node_postprocessors=[processor]
)

Tracing Agents

ReAct Agent

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool

# Create tools
query_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="knowledge_base",
    description="Search the knowledge base"
)

# Create agent
agent = ReActAgent.from_tools(
    [query_tool],
    llm=llm,
    verbose=True
)

# Agent execution fully traced
response = agent.chat("Find information about X and summarize it")

Agent traces include:

Thought process
Tool selection
Tool execution
Observation processing
Final answer

OpenAI Agent

from llama_index.agent.openai import OpenAIAgent

agent = OpenAIAgent.from_tools(
    tools,
    llm=llm,
    verbose=True
)

response = agent.chat("Complex multi-step task")

Streaming

Streaming responses are traced:

query_engine = index.as_query_engine(streaming=True)

# Stream response
streaming_response = query_engine.query("Tell me about...")

for text in streaming_response.response_gen:
    print(text, end="", flush=True)

Adding Context

User Context

handler = BrokleCallbackHandler(
    user_id="user_123",
    session_id="session_456"
)

Trace Metadata

handler = BrokleCallbackHandler(
    metadata={
        "application": "document_qa",
        "index_name": "company_docs",
        "version": "2.0"
    }
)

Dynamic Context

# Update context before query
handler.set_trace_context(
    user_id="user_789",
    metadata={"query_type": "summarization"}
)

response = query_engine.query("Summarize the document")

LLM Configuration

Using Different LLMs

from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic

# OpenAI
llm_openai = OpenAI(model="gpt-4", temperature=0.7)

# Anthropic
llm_anthropic = Anthropic(model="claude-3-sonnet")

# Set globally
Settings.llm = llm_openai

# Or per-query engine
query_engine = index.as_query_engine(llm=llm_anthropic)

Embedding Models

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# OpenAI embeddings (traced)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Local embeddings (traced)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Settings.embed_model = embed_model

Async Support

Full async support:

import asyncio

async def async_query():
    response = await query_engine.aquery("Async question")
    return response

# Run async
response = asyncio.run(async_query())

Async Streaming

async def async_stream():
    streaming_response = await query_engine.aquery("Tell me about...")

    async for text in streaming_response.async_response_gen():
        print(text, end="", flush=True)

Error Handling

Errors are captured with context:

try:
    response = query_engine.query("Question")
except Exception as e:
    # Error captured in trace:
    # - Exception type
    # - Error message
    # - Stack trace
    # - Query that caused error
    print(f"Query failed: {e}")

Configuration Options

handler = BrokleCallbackHandler(
    # Trace configuration
    trace_name="llamaindex_app",
    capture_input=True,
    capture_output=True,

    # Context
    user_id="user_123",
    session_id="session_456",

    # Metadata
    metadata={
        "environment": "production"
    },

    # What to trace
    trace_embeddings=True,      # Trace embedding calls
    trace_retrievals=True,      # Trace retrieval operations
    trace_llm_calls=True,       # Trace LLM calls

    # Privacy
    mask_node_content=False,    # Mask retrieved node content
)

Best Practices

1. Set Global Callbacks Early

# At application startup
from llama_index.core import Settings

Settings.callback_manager = CallbackManager([handler])

2. Add Index Metadata

# Tag traces by index
handler = BrokleCallbackHandler(
    metadata={
        "index_name": "product_docs",
        "index_version": "2024-01"
    }
)

3. Handle Shutdown

import atexit
atexit.register(brokle.shutdown)

4. Monitor Embedding Costs

Embedding generation can be expensive. Track with:

handler = BrokleCallbackHandler(
    trace_embeddings=True,
    metadata={"track_embedding_cost": True}
)

Troubleshooting

Traces Not Appearing

Verify callback is in Settings.callback_manager
Call brokle.flush() before exit
Enable debug: Brokle(debug=True)

Missing Embedding Traces

Ensure embeddings are traced:

handler = BrokleCallbackHandler(
    trace_embeddings=True
)

Incomplete Node Content

By default, only node metadata is captured. For full content:

handler = BrokleCallbackHandler(
    capture_node_content=True
)

Capturing full node content can significantly increase trace size. Enable only when needed for debugging.

Next Steps

LlamaIndex Integration

Integrate Brokle with LlamaIndex to trace query engines, indexes, retrievers, and all LLM interactions within your RAG applications.

Supported Features

Feature	Supported	Notes
Query Engines	✅	All query engine types
Indexes	✅	Vector, list, tree, keyword
Retrievers	✅	With relevance scores
LLM Calls	✅	All LLM providers
Embeddings	✅	Embedding generation
Node Postprocessing	✅	Reranking, filtering
Agents	✅	ReAct, OpenAI agents

Quick Start

Install Dependencies

pip install brokle llama-index llama-index-llms-openai

Set Up Callback Handler

from brokle import Brokle
from brokle.integrations.llamaindex import BrokleCallbackHandler
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

# Initialize Brokle
brokle = Brokle(api_key="bk_...")

# Create callback handler
handler = BrokleCallbackHandler()

# Set as global callback
Settings.callback_manager.add_handler(handler)

Build and Query Index

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader("data").load_data()

# Build index (traced)
index = VectorStoreIndex.from_documents(documents)

# Query (traced)
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")

print(response)
brokle.flush()

Integration Methods

Method 1: Global Callback (Recommended)

Set callbacks globally for all LlamaIndex operations:

from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager

# Create callback manager with Brokle handler
callback_manager = CallbackManager([handler])
Settings.callback_manager = callback_manager

# All operations are now traced
index = VectorStoreIndex.from_documents(documents)
response = index.as_query_engine().query("question")

Method 2: Per-Index Callback

Set callbacks for specific indexes:

from llama_index.core.callbacks import CallbackManager

callback_manager = CallbackManager([handler])

index = VectorStoreIndex.from_documents(
    documents,
    callback_manager=callback_manager
)

Method 3: Context Manager

Use within a specific context:

from brokle.integrations.llamaindex import BrokleTracer

with BrokleTracer(name="rag_query") as tracer:
    response = query_engine.query("What is AI?")

Tracing Query Engines

Basic Query Engine

from llama_index.core import VectorStoreIndex

# Build index
index = VectorStoreIndex.from_documents(documents)

# Create query engine
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"
)

# Query is fully traced
response = query_engine.query("Explain the key concepts")

Trace structure:

query
├── retriever
│   ├── embedding (embed query)
│   └── vector_search
├── node_postprocessor (if any)
├── response_synthesizer
│   └── llm_call (generation)
└── response

Chat Engine

from llama_index.core.chat_engine import SimpleChatEngine

chat_engine = index.as_chat_engine(
    chat_mode="condense_question",
    verbose=True
)

# Multi-turn conversation traced
response = chat_engine.chat("Hello, what can you help me with?")
response = chat_engine.chat("Tell me more about that")

Custom Query Engine

from llama_index.core.query_engine import CustomQueryEngine

class MyQueryEngine(CustomQueryEngine):
    def custom_query(self, query_str: str):
        # Custom logic - all LLM calls are traced
        nodes = self.retriever.retrieve(query_str)
        response = self.llm.complete(f"Answer: {query_str}\nContext: {nodes}")
        return response

query_engine = MyQueryEngine(retriever=retriever, llm=llm)
response = query_engine.query("My question")

Tracing Indexes

Index Building

Index construction is traced:

from llama_index.core import VectorStoreIndex

# Document loading traced
documents = SimpleDirectoryReader("data").load_data()

# Index building traced (including embeddings)
index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True
)

Build traces include:

Document parsing
Chunking/node creation
Embedding generation
Vector store insertion

Different Index Types

from llama_index.core import (
    VectorStoreIndex,
    ListIndex,
    TreeIndex,
    KeywordTableIndex
)

# Vector index (with embeddings)
vector_index = VectorStoreIndex.from_documents(docs)

# List index (sequential)
list_index = ListIndex.from_documents(docs)

# Tree index (hierarchical)
tree_index = TreeIndex.from_documents(docs)

# Keyword index (text-based)
keyword_index = KeywordTableIndex.from_documents(docs)

Tracing Retrievers

Vector Retriever

from llama_index.core.retrievers import VectorIndexRetriever

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10
)

# Retrieval traced with scores
nodes = retriever.retrieve("What is the topic?")

for node in nodes:
    print(f"Score: {node.score}, Text: {node.text[:100]}")

Retrieval traces capture:

Query embedding
Vector similarity search
Retrieved nodes with scores
Retrieval latency

Hybrid Retriever

from llama_index.core.retrievers import BM25Retriever
from llama_index.core.retrievers.fusion_retriever import QueryFusionRetriever

# Combine vector and keyword search
vector_retriever = index.as_retriever(similarity_top_k=5)
bm25_retriever = BM25Retriever.from_defaults(nodes=nodes, similarity_top_k=5)

fusion_retriever = QueryFusionRetriever(
    [vector_retriever, bm25_retriever],
    similarity_top_k=10,
    num_queries=4
)

# All retrievers traced
nodes = fusion_retriever.retrieve("complex query")

Node Processing

Reranking

from llama_index.core.postprocessor import SentenceTransformerRerank

reranker = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    top_n=3
)

query_engine = index.as_query_engine(
    node_postprocessors=[reranker]
)

# Reranking traced
response = query_engine.query("What is important?")

Filtering

from llama_index.core.postprocessor import MetadataReplacementPostProcessor

processor = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)

query_engine = index.as_query_engine(
    node_postprocessors=[processor]
)

Tracing Agents

ReAct Agent

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool

# Create tools
query_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="knowledge_base",
    description="Search the knowledge base"
)

# Create agent
agent = ReActAgent.from_tools(
    [query_tool],
    llm=llm,
    verbose=True
)

# Agent execution fully traced
response = agent.chat("Find information about X and summarize it")

Agent traces include:

Thought process
Tool selection
Tool execution
Observation processing
Final answer

OpenAI Agent

from llama_index.agent.openai import OpenAIAgent

agent = OpenAIAgent.from_tools(
    tools,
    llm=llm,
    verbose=True
)

response = agent.chat("Complex multi-step task")

Streaming

Streaming responses are traced:

query_engine = index.as_query_engine(streaming=True)

# Stream response
streaming_response = query_engine.query("Tell me about...")

for text in streaming_response.response_gen:
    print(text, end="", flush=True)

Adding Context

User Context

handler = BrokleCallbackHandler(
    user_id="user_123",
    session_id="session_456"
)

Trace Metadata

handler = BrokleCallbackHandler(
    metadata={
        "application": "document_qa",
        "index_name": "company_docs",
        "version": "2.0"
    }
)

Dynamic Context

# Update context before query
handler.set_trace_context(
    user_id="user_789",
    metadata={"query_type": "summarization"}
)

response = query_engine.query("Summarize the document")

LLM Configuration

Using Different LLMs

from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic

# OpenAI
llm_openai = OpenAI(model="gpt-4", temperature=0.7)

# Anthropic
llm_anthropic = Anthropic(model="claude-3-sonnet")

# Set globally
Settings.llm = llm_openai

# Or per-query engine
query_engine = index.as_query_engine(llm=llm_anthropic)

Embedding Models

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# OpenAI embeddings (traced)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Local embeddings (traced)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Settings.embed_model = embed_model

Async Support

Full async support:

import asyncio

async def async_query():
    response = await query_engine.aquery("Async question")
    return response

# Run async
response = asyncio.run(async_query())

Async Streaming

async def async_stream():
    streaming_response = await query_engine.aquery("Tell me about...")

    async for text in streaming_response.async_response_gen():
        print(text, end="", flush=True)

Error Handling

Errors are captured with context:

try:
    response = query_engine.query("Question")
except Exception as e:
    # Error captured in trace:
    # - Exception type
    # - Error message
    # - Stack trace
    # - Query that caused error
    print(f"Query failed: {e}")

Configuration Options

handler = BrokleCallbackHandler(
    # Trace configuration
    trace_name="llamaindex_app",
    capture_input=True,
    capture_output=True,

    # Context
    user_id="user_123",
    session_id="session_456",

    # Metadata
    metadata={
        "environment": "production"
    },

    # What to trace
    trace_embeddings=True,      # Trace embedding calls
    trace_retrievals=True,      # Trace retrieval operations
    trace_llm_calls=True,       # Trace LLM calls

    # Privacy
    mask_node_content=False,    # Mask retrieved node content
)

Best Practices

1. Set Global Callbacks Early

# At application startup
from llama_index.core import Settings

Settings.callback_manager = CallbackManager([handler])

2. Add Index Metadata

# Tag traces by index
handler = BrokleCallbackHandler(
    metadata={
        "index_name": "product_docs",
        "index_version": "2024-01"
    }
)

3. Handle Shutdown

import atexit
atexit.register(brokle.shutdown)

4. Monitor Embedding Costs

Embedding generation can be expensive. Track with:

handler = BrokleCallbackHandler(
    trace_embeddings=True,
    metadata={"track_embedding_cost": True}
)

Troubleshooting

Traces Not Appearing

Verify callback is in Settings.callback_manager
Call brokle.flush() before exit
Enable debug: Brokle(debug=True)

Missing Embedding Traces

Ensure embeddings are traced:

handler = BrokleCallbackHandler(
    trace_embeddings=True
)

Incomplete Node Content

By default, only node metadata is captured. For full content:

handler = BrokleCallbackHandler(
    capture_node_content=True
)

Capturing full node content can significantly increase trace size. Enable only when needed for debugging.

LlamaIndex Integration

On this page

LlamaIndex Integration

On this page