Cohere Integration

Name: Brokle
Author: Brokle

Integrate Brokle with Cohere to capture traces, monitor performance, and track costs across all your Cohere API calls including Chat, Embed, and Rerank.

Supported Features

Feature	Supported	Notes
Chat	✅	Full support
Chat Streaming	✅	With TTFT metrics
Embed	✅	Text embeddings
Rerank	✅	Document reranking
Tool Use	✅	Function calling
RAG	✅	Retrieval-augmented generation
Token Counting	✅	Billed tokens tracked
Cost Tracking	✅	Automatic calculation

Quick Start

Install Dependencies

pip install brokle cohere

npm install brokle cohere-ai

Wrap the Client

from brokle import Brokle
from brokle.wrappers import wrap_cohere
import cohere

# Initialize Brokle
brokle = Brokle(api_key="bk_...")

# Wrap Cohere client
client = wrap_cohere(
    cohere.ClientV2(api_key="your-cohere-api-key")
)

import { Brokle } from 'brokle';
import { wrapCohere } from 'brokle/cohere';
import { CohereClient } from 'cohere-ai';

// Initialize Brokle
const brokle = new Brokle({ apiKey: 'bk_...' });

// Wrap Cohere client
const client = wrapCohere(
  new CohereClient({ token: 'your-cohere-api-key' })
);

Make Traced Calls

# All calls are automatically traced
response = client.chat(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "What is Cohere?"}
    ]
)

print(response.message.content[0].text)

# Ensure traces are sent
brokle.flush()

// All calls are automatically traced
const response = await client.chat({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: 'What is Cohere?' }
  ]
});

console.log(response.message.content[0].text);

// Ensure traces are sent
await brokle.shutdown();

Model Support

Chat Models

Model	Model ID	Context	Best For
Command R+	`command-r-plus`	128K	Complex reasoning, RAG
Command R	`command-r`	128K	General tasks
Command	`command`	4K	Fast responses
Command Light	`command-light`	4K	Lightweight tasks

Embedding Models

Model	Model ID	Dimensions	Best For
Embed English v3	`embed-english-v3.0`	1024	English text
Embed Multilingual v3	`embed-multilingual-v3.0`	1024	Multilingual
Embed English Light v3	`embed-english-light-v3.0`	384	Fast embeddings

Rerank Models

Model	Model ID	Best For
Rerank English v3	`rerank-english-v3.0`	English reranking
Rerank Multilingual v3	`rerank-multilingual-v3.0`	Multilingual reranking

Streaming

Streaming is fully supported with time-to-first-token (TTFT) metrics:

# Streaming with automatic tracing
stream = client.chat_stream(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "Write a poem about AI"}
    ]
)

for event in stream:
    if event.type == "content-delta":
        print(event.delta.message.content.text, end="", flush=True)

// Streaming with automatic tracing
const stream = await client.chatStream({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: 'Write a poem about AI' }
  ]
});

for await (const event of stream) {
  if (event.type === 'content-delta') {
    process.stdout.write(event.delta?.message?.content?.text || '');
  }
}

Streaming traces capture:

Metric	Description
`time_to_first_token`	Time until first chunk
`streaming_duration`	Total streaming time
`chunks_count`	Number of stream events
`aggregated_output`	Complete response text

Tool Use

Cohere's function calling is automatically traced:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=tools
)

# Tool calls are captured in traces
if response.message.tool_calls:
    for tool_call in response.message.tool_calls:
        print(f"Tool: {tool_call.function.name}")
        print(f"Args: {tool_call.function.arguments}")

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: {
            type: 'string',
            description: 'City name'
          }
        },
        required: ['location']
      }
    }
  }
];

const response = await client.chat({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: "What's the weather in Paris?" }
  ],
  tools
});

// Tool calls are captured in traces
if (response.message.toolCalls) {
  response.message.toolCalls.forEach(toolCall => {
    console.log('Tool:', toolCall.function.name);
    console.log('Args:', toolCall.function.arguments);
  });
}

Embeddings

Generate and trace embeddings:

response = client.embed(
    model="embed-english-v3.0",
    texts=["Hello world", "Goodbye world"],
    input_type="search_document"
)

for i, embedding in enumerate(response.embeddings):
    print(f"Text {i}: {len(embedding)} dimensions")

# Traces include:
# - Input text count
# - Model used
# - Embedding dimensions
# - Billed units

const response = await client.embed({
  model: 'embed-english-v3.0',
  texts: ['Hello world', 'Goodbye world'],
  inputType: 'search_document'
});

response.embeddings.forEach((embedding, i) => {
  console.log(`Text ${i}: ${embedding.length} dimensions`);
});

Reranking

Rerank documents with tracing:

documents = [
    "Python is a programming language",
    "Paris is the capital of France",
    "Machine learning is a subset of AI"
]

response = client.rerank(
    model="rerank-english-v3.0",
    query="What is Python?",
    documents=documents,
    top_n=2
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score}")
    print(f"Document: {documents[result.index]}")

# Traces include:
# - Query text
# - Document count
# - Rerank scores
# - Search units billed

const documents = [
  'Python is a programming language',
  'Paris is the capital of France',
  'Machine learning is a subset of AI'
];

const response = await client.rerank({
  model: 'rerank-english-v3.0',
  query: 'What is Python?',
  documents,
  topN: 2
});

response.results.forEach(result => {
  console.log(`Index: ${result.index}, Score: ${result.relevanceScore}`);
  console.log(`Document: ${documents[result.index]}`);
});

RAG (Retrieval-Augmented Generation)

Cohere's built-in RAG capabilities are traced:

documents = [
    {"title": "Python", "text": "Python is a programming language..."},
    {"title": "AI", "text": "Artificial Intelligence is..."}
]

response = client.chat(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "What is Python?"}
    ],
    documents=documents
)

# RAG traces include:
# - Documents provided
# - Citations in response
# - Source references
print(response.message.content[0].text)

if response.message.citations:
    for citation in response.message.citations:
        print(f"Citation: {citation}")

const documents = [
  { title: 'Python', text: 'Python is a programming language...' },
  { title: 'AI', text: 'Artificial Intelligence is...' }
];

const response = await client.chat({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: 'What is Python?' }
  ],
  documents
});

console.log(response.message.content[0].text);

if (response.message.citations) {
  response.message.citations.forEach(citation => {
    console.log('Citation:', citation);
  });
}

Cost Tracking

Brokle tracks costs based on Cohere's pricing:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Command R+	$3.00	$15.00
Command R	$0.50	$1.50
Command	$1.00	$2.00
Embed v3	$0.10	-
Rerank v3	$2.00 per 1K searches	-

Cohere uses a unique billing model with "search units" for rerank and "billed tokens" for generation. Brokle captures both for accurate cost tracking.

Cohere-Specific Metrics

Brokle captures Cohere-specific billing information:

Attribute	Description
`cohere.billed_input_tokens`	Tokens billed for input
`cohere.billed_output_tokens`	Tokens billed for output
`cohere.search_units`	Search units for rerank
`cohere.generation_id`	Unique generation ID

Error Handling

from cohere import CohereAPIError, RateLimitedError

try:
    response = client.chat(
        model="command-r-plus",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitedError as e:
    # Rate limited - captured in trace
    print(f"Rate limited: {e}")
except CohereAPIError as e:
    print(f"API error: {e}")

Configuration Options

from brokle import Brokle
from brokle.wrappers import wrap_cohere

brokle = Brokle(
    api_key="bk_...",
    environment="production",
    sample_rate=1.0,
    debug=False
)

client = wrap_cohere(
    cohere.ClientV2(api_key="..."),
    # Integration-specific options
    capture_input=True,      # Capture message content
    capture_output=True,     # Capture response content
    capture_documents=True   # Capture RAG documents
)

Best Practices

1. Use Input Types for Embeddings

# Document embeddings
doc_embeddings = client.embed(
    texts=documents,
    input_type="search_document"
)

# Query embeddings
query_embedding = client.embed(
    texts=[query],
    input_type="search_query"
)

2. Add Context

with brokle.start_as_current_span(name="cohere_rag") as span:
    span.update_trace(user_id="user_123")
    response = client.chat(...)

3. Graceful Shutdown

import atexit
atexit.register(brokle.shutdown)

Troubleshooting

Missing Traces

Verify both API keys are set
Check brokle.flush() is called
Enable debug: Brokle(debug=True)

Billing Discrepancies

Cohere's billed tokens may differ from actual tokens due to their pricing model. Brokle captures the official billed amounts.

Embedding Dimension Mismatch

Ensure you're using the correct model for your embedding dimensions:

v3 models: 1024 dimensions
Light models: 384 dimensions

OpenAI - GPT models
Anthropic - Claude models
LangChain - Framework integration

Next Steps

Cohere Integration

Integrate Brokle with Cohere to capture traces, monitor performance, and track costs across all your Cohere API calls including Chat, Embed, and Rerank.

Supported Features

Feature	Supported	Notes
Chat	✅	Full support
Chat Streaming	✅	With TTFT metrics
Embed	✅	Text embeddings
Rerank	✅	Document reranking
Tool Use	✅	Function calling
RAG	✅	Retrieval-augmented generation
Token Counting	✅	Billed tokens tracked
Cost Tracking	✅	Automatic calculation

Quick Start

Install Dependencies

pip install brokle cohere

npm install brokle cohere-ai

Wrap the Client

from brokle import Brokle
from brokle.wrappers import wrap_cohere
import cohere

# Initialize Brokle
brokle = Brokle(api_key="bk_...")

# Wrap Cohere client
client = wrap_cohere(
    cohere.ClientV2(api_key="your-cohere-api-key")
)

import { Brokle } from 'brokle';
import { wrapCohere } from 'brokle/cohere';
import { CohereClient } from 'cohere-ai';

// Initialize Brokle
const brokle = new Brokle({ apiKey: 'bk_...' });

// Wrap Cohere client
const client = wrapCohere(
  new CohereClient({ token: 'your-cohere-api-key' })
);

Make Traced Calls

# All calls are automatically traced
response = client.chat(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "What is Cohere?"}
    ]
)

print(response.message.content[0].text)

# Ensure traces are sent
brokle.flush()

// All calls are automatically traced
const response = await client.chat({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: 'What is Cohere?' }
  ]
});

console.log(response.message.content[0].text);

// Ensure traces are sent
await brokle.shutdown();

Model Support

Chat Models

Model	Model ID	Context	Best For
Command R+	`command-r-plus`	128K	Complex reasoning, RAG
Command R	`command-r`	128K	General tasks
Command	`command`	4K	Fast responses
Command Light	`command-light`	4K	Lightweight tasks

Embedding Models

Model	Model ID	Dimensions	Best For
Embed English v3	`embed-english-v3.0`	1024	English text
Embed Multilingual v3	`embed-multilingual-v3.0`	1024	Multilingual
Embed English Light v3	`embed-english-light-v3.0`	384	Fast embeddings

Rerank Models

Model	Model ID	Best For
Rerank English v3	`rerank-english-v3.0`	English reranking
Rerank Multilingual v3	`rerank-multilingual-v3.0`	Multilingual reranking

Streaming

Streaming is fully supported with time-to-first-token (TTFT) metrics:

# Streaming with automatic tracing
stream = client.chat_stream(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "Write a poem about AI"}
    ]
)

for event in stream:
    if event.type == "content-delta":
        print(event.delta.message.content.text, end="", flush=True)

// Streaming with automatic tracing
const stream = await client.chatStream({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: 'Write a poem about AI' }
  ]
});

for await (const event of stream) {
  if (event.type === 'content-delta') {
    process.stdout.write(event.delta?.message?.content?.text || '');
  }
}

Streaming traces capture:

Metric	Description
`time_to_first_token`	Time until first chunk
`streaming_duration`	Total streaming time
`chunks_count`	Number of stream events
`aggregated_output`	Complete response text

Tool Use

Cohere's function calling is automatically traced:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"}
    ],
    tools=tools
)

# Tool calls are captured in traces
if response.message.tool_calls:
    for tool_call in response.message.tool_calls:
        print(f"Tool: {tool_call.function.name}")
        print(f"Args: {tool_call.function.arguments}")

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: {
            type: 'string',
            description: 'City name'
          }
        },
        required: ['location']
      }
    }
  }
];

const response = await client.chat({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: "What's the weather in Paris?" }
  ],
  tools
});

// Tool calls are captured in traces
if (response.message.toolCalls) {
  response.message.toolCalls.forEach(toolCall => {
    console.log('Tool:', toolCall.function.name);
    console.log('Args:', toolCall.function.arguments);
  });
}

Embeddings

Generate and trace embeddings:

response = client.embed(
    model="embed-english-v3.0",
    texts=["Hello world", "Goodbye world"],
    input_type="search_document"
)

for i, embedding in enumerate(response.embeddings):
    print(f"Text {i}: {len(embedding)} dimensions")

# Traces include:
# - Input text count
# - Model used
# - Embedding dimensions
# - Billed units

const response = await client.embed({
  model: 'embed-english-v3.0',
  texts: ['Hello world', 'Goodbye world'],
  inputType: 'search_document'
});

response.embeddings.forEach((embedding, i) => {
  console.log(`Text ${i}: ${embedding.length} dimensions`);
});

Reranking

Rerank documents with tracing:

documents = [
    "Python is a programming language",
    "Paris is the capital of France",
    "Machine learning is a subset of AI"
]

response = client.rerank(
    model="rerank-english-v3.0",
    query="What is Python?",
    documents=documents,
    top_n=2
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score}")
    print(f"Document: {documents[result.index]}")

# Traces include:
# - Query text
# - Document count
# - Rerank scores
# - Search units billed

const documents = [
  'Python is a programming language',
  'Paris is the capital of France',
  'Machine learning is a subset of AI'
];

const response = await client.rerank({
  model: 'rerank-english-v3.0',
  query: 'What is Python?',
  documents,
  topN: 2
});

response.results.forEach(result => {
  console.log(`Index: ${result.index}, Score: ${result.relevanceScore}`);
  console.log(`Document: ${documents[result.index]}`);
});

RAG (Retrieval-Augmented Generation)

Cohere's built-in RAG capabilities are traced:

documents = [
    {"title": "Python", "text": "Python is a programming language..."},
    {"title": "AI", "text": "Artificial Intelligence is..."}
]

response = client.chat(
    model="command-r-plus",
    messages=[
        {"role": "user", "content": "What is Python?"}
    ],
    documents=documents
)

# RAG traces include:
# - Documents provided
# - Citations in response
# - Source references
print(response.message.content[0].text)

if response.message.citations:
    for citation in response.message.citations:
        print(f"Citation: {citation}")

const documents = [
  { title: 'Python', text: 'Python is a programming language...' },
  { title: 'AI', text: 'Artificial Intelligence is...' }
];

const response = await client.chat({
  model: 'command-r-plus',
  messages: [
    { role: 'user', content: 'What is Python?' }
  ],
  documents
});

console.log(response.message.content[0].text);

if (response.message.citations) {
  response.message.citations.forEach(citation => {
    console.log('Citation:', citation);
  });
}

Cost Tracking

Brokle tracks costs based on Cohere's pricing:

Model	Input (per 1M tokens)	Output (per 1M tokens)
Command R+	$3.00	$15.00
Command R	$0.50	$1.50
Command	$1.00	$2.00
Embed v3	$0.10	-
Rerank v3	$2.00 per 1K searches	-

Cohere uses a unique billing model with "search units" for rerank and "billed tokens" for generation. Brokle captures both for accurate cost tracking.

Cohere-Specific Metrics

Brokle captures Cohere-specific billing information:

Attribute	Description
`cohere.billed_input_tokens`	Tokens billed for input
`cohere.billed_output_tokens`	Tokens billed for output
`cohere.search_units`	Search units for rerank
`cohere.generation_id`	Unique generation ID

Error Handling

from cohere import CohereAPIError, RateLimitedError

try:
    response = client.chat(
        model="command-r-plus",
        messages=[{"role": "user", "content": "Hello"}]
    )
except RateLimitedError as e:
    # Rate limited - captured in trace
    print(f"Rate limited: {e}")
except CohereAPIError as e:
    print(f"API error: {e}")

Configuration Options

from brokle import Brokle
from brokle.wrappers import wrap_cohere

brokle = Brokle(
    api_key="bk_...",
    environment="production",
    sample_rate=1.0,
    debug=False
)

client = wrap_cohere(
    cohere.ClientV2(api_key="..."),
    # Integration-specific options
    capture_input=True,      # Capture message content
    capture_output=True,     # Capture response content
    capture_documents=True   # Capture RAG documents
)

Best Practices

1. Use Input Types for Embeddings

# Document embeddings
doc_embeddings = client.embed(
    texts=documents,
    input_type="search_document"
)

# Query embeddings
query_embedding = client.embed(
    texts=[query],
    input_type="search_query"
)

2. Add Context

with brokle.start_as_current_span(name="cohere_rag") as span:
    span.update_trace(user_id="user_123")
    response = client.chat(...)

3. Graceful Shutdown

import atexit
atexit.register(brokle.shutdown)

Troubleshooting

Missing Traces

Verify both API keys are set
Check brokle.flush() is called
Enable debug: Brokle(debug=True)

Billing Discrepancies

Cohere's billed tokens may differ from actual tokens due to their pricing model. Brokle captures the official billed amounts.

Embedding Dimension Mismatch

Ensure you're using the correct model for your embedding dimensions:

v3 models: 1024 dimensions
Light models: 384 dimensions

OpenAI - GPT models
Anthropic - Claude models
LangChain - Framework integration

Cohere Integration

On this page

Cohere Integration

On this page