Cost Analytics
Track and optimize spending across AI models, providers, and features
Cost Analytics
Cost analytics gives you complete visibility into your AI spending. Track costs per request, user, feature, and model to optimize your AI investment.
Why Cost Analytics?
AI costs can grow unpredictably. Without visibility, you might discover:
- A single feature consuming 80% of your budget
- Inefficient prompts generating unnecessary tokens
- High-cost models used where cheaper alternatives suffice
- Unexpected spikes from runaway loops or errors
Brokle's cost analytics helps you understand and control spending.
How Costs are Calculated
Brokle automatically calculates costs based on:
Model Pricing
Brokle maintains current pricing for major providers:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4 Turbo | $10.00 | $30.00 |
| GPT-4 | $30.00 | $60.00 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
| Claude 3 Opus | $15.00 | $75.00 |
| Claude 3 Sonnet | $3.00 | $15.00 |
| Claude 3 Haiku | $0.25 | $1.25 |
Pricing is updated regularly. For enterprise agreements with custom pricing, you can configure custom rates in your project settings.
Viewing Costs
Trace-Level Costs
Every trace shows its total cost:
Trace: "rag_pipeline"
├── Span: embed_query
│ └── Cost: $0.00002
├── Span: gpt-4-call
│ ├── Input tokens: 1,500
│ ├── Output tokens: 500
│ └── Cost: $0.045
└── Total Cost: $0.04502Dashboard Analytics
The cost dashboard provides:
┌─────────────────────────────────────────────────────────┐
│ Cost Analytics - Last 30 Days │
├─────────────────────────────────────────────────────────┤
│ │
│ Total Spend: $1,234.56 │
│ Avg Cost/Request: $0.032 │
│ Total Requests: 38,580 │
│ Total Tokens: 45.2M │
│ │
│ By Model: │
│ ├── GPT-4: $890.00 (72%) │
│ ├── GPT-3.5: $234.00 (19%) │
│ └── Embeddings: $110.56 (9%) │
│ │
│ By Feature: │
│ ├── Chat: $650.00 (53%) │
│ ├── Search: $384.00 (31%) │
│ └── Analytics: $200.56 (16%) │
│ │
└─────────────────────────────────────────────────────────┘Cost Attribution
By User
Track spending per user for billing or rate limiting:
with client.start_as_current_span(name="chat_response") as span:
span.update_trace(user_id=user.id)
# Cost automatically attributed to user
response = generate_response(message)const span = client.startSpan({
name: 'chat_response',
attributes: { userId: user.id }
});
// Cost automatically attributed to user
const response = await generateResponse(message);
span.end({ output: response });By Feature
Tag traces with features for spend analysis:
with client.start_as_current_span(name="generate_summary") as span:
span.set_attribute("feature", "document_summarization")
span.set_attribute("product_area", "productivity")
summary = summarize_document(document)By Customer Tier
Understand cost distribution across customer segments:
with client.start_as_current_span(name="premium_feature") as span:
span.set_attribute("customer_tier", customer.tier) # "free", "pro", "enterprise"
span.set_attribute("customer_id", customer.id)
result = expensive_ai_operation()Cost Optimization
1. Model Selection
Choose the right model for each task:
def select_model(task_complexity: str) -> str:
"""Select cost-appropriate model"""
model_map = {
"simple": "gpt-3.5-turbo", # $0.002/1K tokens
"moderate": "gpt-4-turbo", # $0.01/1K tokens
"complex": "gpt-4", # $0.03/1K tokens
}
return model_map.get(task_complexity, "gpt-3.5-turbo")
with client.start_as_current_span(name="adaptive_generation") as span:
model = select_model(analyze_complexity(prompt))
span.set_attribute("model_selection_reason", "complexity-based")
response = openai.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}]
)2. Prompt Optimization
Reduce token usage without sacrificing quality:
# Before optimization: 2,500 tokens
system_prompt = """
You are a helpful assistant. Your job is to help users with their questions.
You should always be polite and professional. You should provide accurate information.
When you don't know something, you should say so. You should not make things up.
... (continues for 500 more words)
"""
# After optimization: 150 tokens
system_prompt = """
Helpful assistant. Answer accurately and concisely.
Acknowledge uncertainty. Be professional.
"""3. Caching Strategies
Avoid redundant API calls:
import hashlib
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_embedding(text: str) -> list[float]:
"""Cache embeddings to avoid repeated API calls"""
return openai.embeddings.create(
model="text-embedding-3-small",
input=text
).data[0].embedding
# Usage with cost tracking
with client.start_as_current_span(name="get_embedding") as span:
cache_key = hashlib.md5(text.encode()).hexdigest()
if cache_key in embedding_cache:
span.set_attribute("cache_hit", True)
span.set_attribute("cost_saved", True)
embedding = embedding_cache[cache_key]
else:
span.set_attribute("cache_hit", False)
embedding = get_embedding(text)
embedding_cache[cache_key] = embedding4. Request Batching
Combine multiple requests to reduce overhead:
# Instead of N separate calls
def embed_documents_inefficient(docs: list[str]) -> list:
embeddings = []
for doc in docs:
embedding = get_embedding(doc) # N API calls
embeddings.append(embedding)
return embeddings
# Batch into single call
def embed_documents_efficient(docs: list[str]) -> list:
with client.start_as_current_span(name="batch_embed") as span:
span.set_attribute("batch_size", len(docs))
response = openai.embeddings.create(
model="text-embedding-3-small",
input=docs # Single API call
)
span.set_attribute("tokens_used", response.usage.total_tokens)
return [e.embedding for e in response.data]Cost Alerts
Set up alerts for spending thresholds:
Budget Alerts
# Configure in dashboard or via API
budget_config = {
"daily_limit": 100.00, # Alert at $100/day
"weekly_limit": 500.00, # Alert at $500/week
"monthly_limit": 2000.00, # Alert at $2000/month
"alert_threshold": 0.80, # Alert at 80% of limit
}Anomaly Detection
Alert: Unusual spending detected
├── Current: $45.23 (last hour)
├── Average: $12.50 (hourly baseline)
├── Increase: +262%
├── Top contributor: feature/chat (89%)
└── Action: Investigate /chat endpointCost Reports
Export Options
Generate cost reports for:
- Billing: Invoice customers based on usage
- Planning: Forecast future costs
- Optimization: Identify high-cost patterns
# API: Get cost breakdown
report = client.get_cost_report(
start_date="2024-01-01",
end_date="2024-01-31",
group_by=["model", "feature", "user_id"],
format="csv"
)Report Dimensions
| Dimension | Use Case |
|---|---|
| Time | Trend analysis, seasonality |
| Model | Identify expensive models |
| Feature | Product cost allocation |
| User | Usage-based billing |
| Customer Tier | Segment profitability |
| Environment | Prod vs staging costs |
Cost vs Quality Trade-offs
Balance cost and quality based on your needs:
┌─────────────────────────────────────────────────────────┐
│ Cost vs Quality Matrix │
├─────────────────────────────────────────────────────────┤
│ │
│ GPT-4 ★★★★★ Quality $$$$$ Cost │
│ GPT-4T ★★★★☆ Quality $$$ Cost │
│ GPT-3.5 ★★★☆☆ Quality $ Cost │
│ Claude-H ★★★☆☆ Quality $ Cost │
│ │
│ Recommendation by Use Case: │
│ ├── Simple Q&A: GPT-3.5 or Claude Haiku │
│ ├── Complex reasoning: GPT-4 Turbo │
│ ├── Long documents: Claude Sonnet │
│ └── Maximum quality: GPT-4 or Claude Opus │
│ │
└─────────────────────────────────────────────────────────┘Best Practices
1. Tag Everything
Enable granular cost analysis:
span.set_attribute("feature", "summarization")
span.set_attribute("customer_tier", "enterprise")
span.set_attribute("prompt_version", "v2.1")
span.update_trace(user_id=user.id)2. Monitor Regularly
Check cost dashboards daily/weekly:
- Identify unexpected spikes
- Track optimization impact
- Catch runaway loops early
3. Set Budgets
Prevent bill shock with limits:
# Rate limit expensive operations
if get_daily_user_cost(user_id) > 10.00:
raise RateLimitError("Daily limit exceeded")4. Test with Cheaper Models
Use inexpensive models during development:
model = "gpt-3.5-turbo" if os.getenv("ENV") == "development" else "gpt-4"Always test in production-like conditions before deployment. Cheaper models may not catch edge cases that expensive models handle correctly.
Related Concepts
- Traces - Cost is tracked per trace
- Evaluations - Balance cost vs quality
- Sessions - Track session-level costs