LLM Providers

Burki Voice AI lets you choose from the best language models (LLMs) on the market—or even use your own custom endpoint. Mix and match for reliability and performance!

Provider Comparison

Provider	Example Models	Speed	Quality	Cost	Notes
OpenAI	GPT-4o, GPT-3.5-turbo	Fast	Excellent	$$	Most popular, reliable
Anthropic	Claude 3.5, Haiku	Fast	Advanced	$$	Safety-focused
Gemini	Gemini 2.0 Flash	Fast	Multimodal	$$	Google’s latest
xAI	Grok Beta, Grok 2	Very Fast	Good	$	Real-time, less strict
Groq	Llama 3.3, 3.1	Ultra Fast	Good	$	Speed-focused
Custom	Any	?	?	?	Bring your own (HTTP)
Custom WebSocket	Any	Ultra Fast	?	?	Lowest latency option

How to Set Up Each Provider

OpenAI (GPT)

Sign up for OpenAI
Go to API Keys and create a new key
Paste your key in the LLM API Key field
Choose a model (GPT-4o, GPT-3.5-turbo, etc.)
(Optional) Set a custom base URL for Azure/OpenAI-compatible endpoints

GPT-4o is best for complex conversations. GPT-3.5-turbo is cheaper for simple tasks.

Anthropic (Claude)

Sign up for Anthropic
Create an API key
Paste your key in the LLM API Key field
Choose a model (Claude 3.5 Sonnet, Haiku, etc.)

Claude models are great for advanced reasoning and safety-critical use cases.

Google Gemini

Sign up for Google AI Studio
Get your API key
Paste your key in the LLM API Key field
Choose a model (Gemini 2.0 Flash, 1.5 Pro, etc.)
Set the base URL (auto-filled for Gemini)

Gemini is multimodal—great for tasks involving text, images, and more.

xAI (Grok)

Sign up for xAI
Get your API key
Paste your key in the LLM API Key field
Choose a model (Grok Beta, Grok 2, etc.)
Set the base URL (auto-filled for xAI)

Grok is fast and less restrictive—good for real-time or experimental use.

Groq (Llama)

Sign up for Groq
Get your API key
Paste your key in the LLM API Key field
Choose a model (Llama 3.3 70B, 3.1 8B, etc.)

Groq is all about speed—perfect for high-volume or latency-sensitive apps.

Custom Endpoint (HTTP)

Enter your custom API endpoint URL
(Optional) Add an API key if required
Specify the model name

Custom endpoints are for advanced users. Make sure your API is OpenAI-compatible!

Custom WebSocket (Lowest Latency)

Use a persistent WebSocket connection to your own LLM endpoint for the lowest possible latency. This is ideal for self-hosted models or custom inference servers.Key Benefits:

Persistent Connection: Maintains connection throughout the entire call—no connection overhead per request
Pre-warming: Sends initialization data before the first transcript, allowing your server to pre-load context
Automatic Reconnection: Exponential backoff with configurable retry attempts
HTTP Fallback: Falls back to HTTP if WebSocket disconnects

Setup:

Enter your WebSocket URL (e.g., wss://api.example.com/ws/completions)
(Optional) Add a fallback HTTP URL for reliability
(Optional) Add an API key if required

Configuration:

{
  "llm_provider": "custom_websocket",
  "llm_provider_config": {
    "ws_url": "wss://api.example.com/ws/completions",
    "fallback_base_url": "https://api.example.com/v1",
    "api_key": "your_api_key",
    "reconnect_attempts": 3,
    "reconnect_delay": 1.0,
    "ping_interval": 30
  }
}

WebSocket Protocol Specification

Connection

Burki connects to {ws_url}/{call_id} for each call. For example, if your ws_url is wss://api.example.com/ws/completions and the call ID is call_abc123, the connection URL will be:

wss://api.example.com/ws/completions/call_abc123

Messages FROM Burki (Your Server Receives)

Your server will receive these JSON messages:

1. Init Message (Pre-warming)

Sent immediately after connection, before any transcripts. Use this to pre-load context, do DB lookups, and prepare for fast responses.

{
  "type": "init",
  "call_id": "call_abc123",
  "phoneNumber": { "number": "+1234567890" },
  "call": {
    "customer": { "number": "+0987654321" },
    "phoneCallProviderId": "call_abc123"
  }
}

Field	Description
`type`	Always `"init"`
`call_id`	Unique identifier for this call
`phoneNumber.number`	The phone number being called (your assistant’s number)
`call.customer.number`	The caller’s phone number
`call.phoneCallProviderId`	Provider-specific call ID

Your server should respond with:

{
  "type": "session_ready",
  "call_id": "call_abc123"
}

2. Transcript Message (LLM Request)

Sent when the caller speaks. This is the main request for LLM completion.

{
  "type": "transcript",
  "request_id": "req_call_abc123_1_1706832000000",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant..." },
    { "role": "user", "content": "Hello, I need help with my order." }
  ],
  "phoneNumber": { "number": "+1234567890" },
  "call": {
    "phoneCallProviderId": "call_abc123",
    "customer": { "number": "+0987654321" }
  },
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "transfer_call",
        "description": "Transfer the call to a human agent",
        "parameters": { "type": "object", "properties": {} }
      }
    }
  ]
}

Field	Description
`type`	Always `"transcript"`
`request_id`	Unique ID for this request (use in responses)
`messages`	OpenAI-compatible message array
`tools`	Optional array of available function tools

3. Ping Message (Keep-alive)

Sent periodically to keep the connection alive.

{
  "type": "ping",
  "timestamp": 1706832000000
}

Your server should respond with:

{
  "type": "pong"
}

4. Cancel Message

Sent when the user interrupts or the request should be cancelled.

{
  "type": "cancel",
  "request_id": "req_call_abc123_1_1706832000000"
}

Your server should:

Stop generating for this request_id
Optionally respond with:

{
  "type": "cancelled",
  "request_id": "req_call_abc123_1_1706832000000"
}

Messages TO Burki (Your Server Sends)

Your server should send these JSON messages:

1. Session Ready

Confirm that pre-warming is complete and the session is ready.

{
  "type": "session_ready",
  "call_id": "call_abc123"
}

2. Streaming Response Chunks (OpenAI-compatible)

Stream LLM responses using OpenAI-compatible format. Each chunk must include the request_id.Text content chunk:

{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": { "content": "Hello! " },
    "finish_reason": null
  }]
}

Final chunk (end of response):

{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {},
    "finish_reason": "stop"
  }]
}

3. Tool Call Response

If your LLM decides to call a tool (like transfer_call, end_call, or send_dtmf):Tool call start:

{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {
      "tool_calls": [{
        "id": "tool_123",
        "function": {
          "name": "transfer_call",
          "arguments": ""
        }
      }]
    },
    "finish_reason": null
  }]
}

Tool call arguments (can be streamed in chunks):

{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {
      "tool_calls": [{
        "function": {
          "arguments": "{\"destination\": \"+1555"
        }
      }]
    },
    "finish_reason": null
  }]
}

Tool call complete:

{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {},
    "finish_reason": "tool_calls"
  }]
}

4. Pong Response

Response to ping for keep-alive.

{
  "type": "pong"
}

5. Error Response

If something goes wrong:

{
  "request_id": "req_call_abc123_1_1706832000000",
  "error": {
    "message": "Internal server error",
    "code": "internal_error"
  }
}

Available Built-in Tools

When Burki sends a transcript request, it may include tools that your LLM can call:

Tool Name	Description	Arguments
`transfer_call`	Transfer to another number	`destination` (phone number)
`end_call`	End the current call	None
`send_dtmf`	Send DTMF tones	`digits` (string of digits)

The WebSocket connects to {ws_url}/{call_id} for each call, allowing your server to route and manage connections per-call. The init message is sent before any audio is processed, giving you time to warm up caches.

Custom WebSocket is for advanced users running their own inference infrastructure. Your server must implement the full message protocol described above. All responses must be valid JSON and include the request_id for transcript responses.

Fallback Chains

You can add up to 3 fallback LLM providers. If your main provider fails, Burki Voice AI will automatically try the next one—keeping your assistant reliable.

Troubleshooting

Common Issues

Invalid API key: Double-check your key and provider dashboard
Model not available: Make sure your account has access to the selected model
Timeouts or errors: Try a different provider or set up fallbacks
Custom endpoint issues: Ensure your API matches the OpenAI API spec

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

Provider Comparison

How to Set Up Each Provider

WebSocket Protocol Specification

Connection

Messages FROM Burki (Your Server Receives)

Messages TO Burki (Your Server Sends)

Available Built-in Tools

Fallback Chains

Troubleshooting

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

​Provider Comparison

​How to Set Up Each Provider

​WebSocket Protocol Specification

​Connection

​Messages FROM Burki (Your Server Receives)

​Messages TO Burki (Your Server Sends)

​Available Built-in Tools

​Fallback Chains

​Troubleshooting

Provider Comparison

How to Set Up Each Provider

WebSocket Protocol Specification

Connection

Messages FROM Burki (Your Server Receives)

Messages TO Burki (Your Server Sends)

Available Built-in Tools

Fallback Chains

Troubleshooting