Connect Brokle with your LLM stack - OpenAI, Anthropic, LangChain, LlamaIndex, and more

Integrations

Name: Brokle
Author: Brokle

Brokle integrates with popular LLM providers and frameworks to provide comprehensive observability with minimal code changes.

How Integrations Work

Brokle integrations use wrapper functions that intercept LLM calls and automatically:

Capture request data: Model, messages, parameters
Record timing: Latency, time to first token
Track usage: Token counts, costs
Capture responses: Full output, streaming chunks
Handle errors: Exception details, retry attempts

# Before: No observability
response = openai.chat.completions.create(...)

# After: Full observability with one line
openai = wrap_openai(openai)
response = openai.chat.completions.create(...)  # Automatically traced

LLM Providers

Integrate directly with LLM provider APIs for the most control.

OpenAI

GPT-4, GPT-3.5, embeddings, and more

Anthropic

Claude 3 Opus, Sonnet, Haiku models

Google GenAI

Gemini 1.5 Pro, Flash, and embeddings

Mistral AI

Mistral Large, Medium, Small, Codestral

Azure OpenAI

Azure-hosted OpenAI models

AWS Bedrock

Claude, Titan, Llama on AWS

Cohere

Command R+, embeddings, reranking

Supported Providers

Provider	Python	JavaScript	Features
OpenAI	✅	✅	Chat, embeddings, streaming, vision
Anthropic	✅	✅	Messages, streaming, vision
Google GenAI	✅	✅	Gemini models, streaming, vision
Mistral AI	✅	✅	Chat, embeddings, streaming
Azure OpenAI	✅	✅	Same as OpenAI, Azure AD auth
AWS Bedrock	✅	✅	Multi-model, Converse API
Cohere	✅	✅	Chat, embeddings, reranking

Framework Integrations

For higher-level frameworks that orchestrate multiple LLM calls.

Vercel AI SDK

Next.js and React AI applications

LangChain

Chains, agents, and retrieval workflows

LlamaIndex

Data indexing and RAG applications

CrewAI

Multi-agent orchestration workflows

Framework Support

Framework	Python	JavaScript	Features
Vercel AI SDK	❌	✅	generateText, streamText, tools
LangChain	✅	✅	Chains, agents, callbacks
LlamaIndex	✅	❌	Query engines, indexes
CrewAI	✅	❌	Multi-agent, tasks, tools
DSPy	🔜	❌	Coming soon

Integration Patterns

Pattern 1: Wrapper Functions

The simplest approach - wrap your client once, use everywhere:

from brokle import Brokle, wrap_openai
import openai

client = Brokle(api_key="bk_...")
openai = wrap_openai(openai.OpenAI())

# All calls are now traced
response = openai.chat.completions.create(...)

Pattern 2: Callback Handlers

For frameworks like LangChain that support callbacks:

from brokle.integrations.langchain import BrokleCallbackHandler

handler = BrokleCallbackHandler()
chain.invoke(input, callbacks=[handler])

Pattern 3: Manual Instrumentation

For custom integrations or unsupported providers:

with client.start_as_current_generation(
    name="custom_llm_call",
    model="custom-model"
) as gen:
    response = custom_llm.generate(prompt)
    gen.update(
        output=response.text,
        usage={"prompt_tokens": 100, "completion_tokens": 50}
    )

What Gets Captured

Request Data

Field	Description
`model`	Model identifier
`messages`	Chat messages array
`temperature`	Temperature setting
`max_tokens`	Token limit
`tools`	Function/tool definitions
`system`	System prompt

Response Data

Field	Description
`output`	Generated text
`finish_reason`	Why generation stopped
`tool_calls`	Function calls made
`usage`	Token counts

Metadata

Field	Description
`latency`	Total request time
`time_to_first_token`	Streaming TTFT
`cost`	Calculated cost
`provider`	Provider name
`error`	Error details if failed

Privacy Controls

Control what data is captured:

from brokle import Brokle

client = Brokle(
    api_key="bk_...",
    privacy={
        "mask_inputs": True,      # Mask message contents
        "mask_outputs": True,     # Mask response contents
        "mask_patterns": [        # Custom PII patterns
            r"\b\d{3}-\d{2}-\d{4}\b",  # SSN
            r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"  # Email
        ]
    }
)

When masking is enabled, Brokle still captures metadata (tokens, cost, latency) but not the actual content.

Streaming Support

All integrations support streaming:

stream = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
# Trace is automatically finalized when stream completes

Streaming traces capture:

Time to first token
Total streaming duration
All chunks aggregated
Token usage (when available)

Error Handling

Errors are automatically captured:

try:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except openai.RateLimitError as e:
    # Error is captured in the trace with details
    # status: "error"
    # error: "Rate limit exceeded"
    raise

Best Practices

1. Initialize Once

# Good: Single client instance
client = Brokle(api_key="bk_...")
openai = wrap_openai(openai.OpenAI())

# Bad: Multiple instances
def process():
    client = Brokle(...)  # Don't do this

2. Use Environment Variables

export BROKLE_API_KEY=bk_...
export OPENAI_API_KEY=sk_...

client = Brokle()  # Reads from env

3. Add Context

with client.start_as_current_span(name="chat") as span:
    span.update_trace(user_id="user_123", session_id="session_456")
    response = openai.chat.completions.create(...)

4. Graceful Shutdown

import atexit
atexit.register(client.shutdown)

Troubleshooting

Traces Not Appearing

Verify API key is correct
Check network connectivity
Enable debug mode: Brokle(debug=True)
Ensure client.flush() is called before exit

Missing Token Counts

Some providers don't return usage in streaming mode. Brokle estimates when needed.

High Latency

The integration adds less than 1ms overhead. If experiencing latency:

Enable sampling: Brokle(sample_rate=0.1)
Reduce batch size: Brokle(flush_at=20)

Gateway Integrations

For unified LLM routing and load balancing.

LiteLLM

Unified interface for 100+ LLMs

Gateway	Features
LiteLLM	100+ models, load balancing, fallbacks, OTLP export

Next Steps

OpenAI Integration

Complete OpenAI setup guide

Vercel AI SDK

Next.js AI applications

LangChain Integration

Trace chains and agents

Integrations

Brokle integrates with popular LLM providers and frameworks to provide comprehensive observability with minimal code changes.

How Integrations Work

Brokle integrations use wrapper functions that intercept LLM calls and automatically:

Capture request data: Model, messages, parameters
Record timing: Latency, time to first token
Track usage: Token counts, costs
Capture responses: Full output, streaming chunks
Handle errors: Exception details, retry attempts

# Before: No observability
response = openai.chat.completions.create(...)

# After: Full observability with one line
openai = wrap_openai(openai)
response = openai.chat.completions.create(...)  # Automatically traced

LLM Providers

Integrate directly with LLM provider APIs for the most control.

OpenAI

GPT-4, GPT-3.5, embeddings, and more

Anthropic

Claude 3 Opus, Sonnet, Haiku models

Google GenAI

Gemini 1.5 Pro, Flash, and embeddings

Mistral AI

Mistral Large, Medium, Small, Codestral

Azure OpenAI

Azure-hosted OpenAI models

AWS Bedrock

Claude, Titan, Llama on AWS

Cohere

Command R+, embeddings, reranking

Supported Providers

Provider	Python	JavaScript	Features
OpenAI	✅	✅	Chat, embeddings, streaming, vision
Anthropic	✅	✅	Messages, streaming, vision
Google GenAI	✅	✅	Gemini models, streaming, vision
Mistral AI	✅	✅	Chat, embeddings, streaming
Azure OpenAI	✅	✅	Same as OpenAI, Azure AD auth
AWS Bedrock	✅	✅	Multi-model, Converse API
Cohere	✅	✅	Chat, embeddings, reranking

Framework Integrations

For higher-level frameworks that orchestrate multiple LLM calls.

Vercel AI SDK

Next.js and React AI applications

LangChain

Chains, agents, and retrieval workflows

LlamaIndex

Data indexing and RAG applications

CrewAI

Multi-agent orchestration workflows

Framework Support

Framework	Python	JavaScript	Features
Vercel AI SDK	❌	✅	generateText, streamText, tools
LangChain	✅	✅	Chains, agents, callbacks
LlamaIndex	✅	❌	Query engines, indexes
CrewAI	✅	❌	Multi-agent, tasks, tools
DSPy	🔜	❌	Coming soon

Integration Patterns

Pattern 1: Wrapper Functions

The simplest approach - wrap your client once, use everywhere:

from brokle import Brokle, wrap_openai
import openai

client = Brokle(api_key="bk_...")
openai = wrap_openai(openai.OpenAI())

# All calls are now traced
response = openai.chat.completions.create(...)

Pattern 2: Callback Handlers

For frameworks like LangChain that support callbacks:

from brokle.integrations.langchain import BrokleCallbackHandler

handler = BrokleCallbackHandler()
chain.invoke(input, callbacks=[handler])

Pattern 3: Manual Instrumentation

For custom integrations or unsupported providers:

with client.start_as_current_generation(
    name="custom_llm_call",
    model="custom-model"
) as gen:
    response = custom_llm.generate(prompt)
    gen.update(
        output=response.text,
        usage={"prompt_tokens": 100, "completion_tokens": 50}
    )

What Gets Captured

Request Data

Field	Description
`model`	Model identifier
`messages`	Chat messages array
`temperature`	Temperature setting
`max_tokens`	Token limit
`tools`	Function/tool definitions
`system`	System prompt

Response Data

Field	Description
`output`	Generated text
`finish_reason`	Why generation stopped
`tool_calls`	Function calls made
`usage`	Token counts

Metadata

Field	Description
`latency`	Total request time
`time_to_first_token`	Streaming TTFT
`cost`	Calculated cost
`provider`	Provider name
`error`	Error details if failed

Privacy Controls

Control what data is captured:

from brokle import Brokle

client = Brokle(
    api_key="bk_...",
    privacy={
        "mask_inputs": True,      # Mask message contents
        "mask_outputs": True,     # Mask response contents
        "mask_patterns": [        # Custom PII patterns
            r"\b\d{3}-\d{2}-\d{4}\b",  # SSN
            r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"  # Email
        ]
    }
)

When masking is enabled, Brokle still captures metadata (tokens, cost, latency) but not the actual content.

Streaming Support

All integrations support streaming:

stream = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
# Trace is automatically finalized when stream completes

Streaming traces capture:

Time to first token
Total streaming duration
All chunks aggregated
Token usage (when available)

Error Handling

Errors are automatically captured:

try:
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except openai.RateLimitError as e:
    # Error is captured in the trace with details
    # status: "error"
    # error: "Rate limit exceeded"
    raise

Best Practices

1. Initialize Once

# Good: Single client instance
client = Brokle(api_key="bk_...")
openai = wrap_openai(openai.OpenAI())

# Bad: Multiple instances
def process():
    client = Brokle(...)  # Don't do this

2. Use Environment Variables

export BROKLE_API_KEY=bk_...
export OPENAI_API_KEY=sk_...

client = Brokle()  # Reads from env

3. Add Context

with client.start_as_current_span(name="chat") as span:
    span.update_trace(user_id="user_123", session_id="session_456")
    response = openai.chat.completions.create(...)

4. Graceful Shutdown

import atexit
atexit.register(client.shutdown)

Troubleshooting

Traces Not Appearing

Verify API key is correct
Check network connectivity
Enable debug mode: Brokle(debug=True)
Ensure client.flush() is called before exit

Missing Token Counts

Some providers don't return usage in streaming mode. Brokle estimates when needed.

High Latency

The integration adds less than 1ms overhead. If experiencing latency:

Enable sampling: Brokle(sample_rate=0.1)
Reduce batch size: Brokle(flush_at=20)

Gateway Integrations

For unified LLM routing and load balancing.

LiteLLM

Unified interface for 100+ LLMs

Gateway	Features
LiteLLM	100+ models, load balancing, fallbacks, OTLP export

Next Steps

OpenAI Integration

Complete OpenAI setup guide

Vercel AI SDK

Next.js AI applications

LangChain Integration

Trace chains and agents

Integrations

OpenAI

Anthropic

Google GenAI

Mistral AI

Azure OpenAI

AWS Bedrock

Cohere

Vercel AI SDK

LangChain

LlamaIndex

CrewAI

LiteLLM

OpenAI Integration

Vercel AI SDK

LangChain Integration

On this page

Integrations

OpenAI

Anthropic

Google GenAI

Mistral AI

Azure OpenAI

AWS Bedrock

Cohere

Vercel AI SDK

LangChain

LlamaIndex

CrewAI

LiteLLM

OpenAI Integration

Vercel AI SDK

LangChain Integration

On this page