Skip to main content
Burki Voice AI lets you choose from the best language models (LLMs) on the market—or even use your own custom endpoint. Mix and match for reliability and performance!

Provider Comparison

ProviderExample ModelsSpeedQualityCostNotes
OpenAIGPT-4o, GPT-3.5-turboFastExcellent$$Most popular, reliable
AnthropicClaude 3.5, HaikuFastAdvanced$$Safety-focused
GeminiGemini 2.0 FlashFastMultimodal$$Google’s latest
xAIGrok Beta, Grok 2Very FastGood$Real-time, less strict
GroqLlama 3.3, 3.1Ultra FastGood$Speed-focused
CustomAny???Bring your own (HTTP)
Custom WebSocketAnyUltra Fast??Lowest latency option

How to Set Up Each Provider

  1. Sign up for OpenAI
  2. Go to API Keys and create a new key
  3. Paste your key in the LLM API Key field
  4. Choose a model (GPT-4o, GPT-3.5-turbo, etc.)
  5. (Optional) Set a custom base URL for Azure/OpenAI-compatible endpoints
GPT-4o is best for complex conversations. GPT-3.5-turbo is cheaper for simple tasks.
  1. Sign up for Anthropic
  2. Create an API key
  3. Paste your key in the LLM API Key field
  4. Choose a model (Claude 3.5 Sonnet, Haiku, etc.)
Claude models are great for advanced reasoning and safety-critical use cases.
  1. Sign up for Google AI Studio
  2. Get your API key
  3. Paste your key in the LLM API Key field
  4. Choose a model (Gemini 2.0 Flash, 1.5 Pro, etc.)
  5. Set the base URL (auto-filled for Gemini)
Gemini is multimodal—great for tasks involving text, images, and more.
  1. Sign up for xAI
  2. Get your API key
  3. Paste your key in the LLM API Key field
  4. Choose a model (Grok Beta, Grok 2, etc.)
  5. Set the base URL (auto-filled for xAI)
Grok is fast and less restrictive—good for real-time or experimental use.
  1. Sign up for Groq
  2. Get your API key
  3. Paste your key in the LLM API Key field
  4. Choose a model (Llama 3.3 70B, 3.1 8B, etc.)
Groq is all about speed—perfect for high-volume or latency-sensitive apps.
  1. Enter your custom API endpoint URL
  2. (Optional) Add an API key if required
  3. Specify the model name
Custom endpoints are for advanced users. Make sure your API is OpenAI-compatible!
Use a persistent WebSocket connection to your own LLM endpoint for the lowest possible latency. This is ideal for self-hosted models or custom inference servers.Key Benefits:
  • Persistent Connection: Maintains connection throughout the entire call—no connection overhead per request
  • Pre-warming: Sends initialization data before the first transcript, allowing your server to pre-load context
  • Automatic Reconnection: Exponential backoff with configurable retry attempts
  • HTTP Fallback: Falls back to HTTP if WebSocket disconnects
Setup:
  1. Enter your WebSocket URL (e.g., wss://api.example.com/ws/completions)
  2. (Optional) Add a fallback HTTP URL for reliability
  3. (Optional) Add an API key if required
Configuration:
{
  "llm_provider": "custom_websocket",
  "llm_provider_config": {
    "ws_url": "wss://api.example.com/ws/completions",
    "fallback_base_url": "https://api.example.com/v1",
    "api_key": "your_api_key",
    "reconnect_attempts": 3,
    "reconnect_delay": 1.0,
    "ping_interval": 30
  }
}

WebSocket Protocol Specification

Connection

Burki connects to {ws_url}/{call_id} for each call. For example, if your ws_url is wss://api.example.com/ws/completions and the call ID is call_abc123, the connection URL will be:
wss://api.example.com/ws/completions/call_abc123

Messages FROM Burki (Your Server Receives)

Your server will receive these JSON messages:
Sent immediately after connection, before any transcripts. Use this to pre-load context, do DB lookups, and prepare for fast responses.
{
  "type": "init",
  "call_id": "call_abc123",
  "phoneNumber": { "number": "+1234567890" },
  "call": {
    "customer": { "number": "+0987654321" },
    "phoneCallProviderId": "call_abc123"
  }
}
FieldDescription
typeAlways "init"
call_idUnique identifier for this call
phoneNumber.numberThe phone number being called (your assistant’s number)
call.customer.numberThe caller’s phone number
call.phoneCallProviderIdProvider-specific call ID
Your server should respond with:
{
  "type": "session_ready",
  "call_id": "call_abc123"
}
Sent when the caller speaks. This is the main request for LLM completion.
{
  "type": "transcript",
  "request_id": "req_call_abc123_1_1706832000000",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant..." },
    { "role": "user", "content": "Hello, I need help with my order." }
  ],
  "phoneNumber": { "number": "+1234567890" },
  "call": {
    "phoneCallProviderId": "call_abc123",
    "customer": { "number": "+0987654321" }
  },
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "transfer_call",
        "description": "Transfer the call to a human agent",
        "parameters": { "type": "object", "properties": {} }
      }
    }
  ]
}
FieldDescription
typeAlways "transcript"
request_idUnique ID for this request (use in responses)
messagesOpenAI-compatible message array
toolsOptional array of available function tools
Sent periodically to keep the connection alive.
{
  "type": "ping",
  "timestamp": 1706832000000
}
Your server should respond with:
{
  "type": "pong"
}
Sent when the user interrupts or the request should be cancelled.
{
  "type": "cancel",
  "request_id": "req_call_abc123_1_1706832000000"
}
Your server should:
  1. Stop generating for this request_id
  2. Optionally respond with:
{
  "type": "cancelled",
  "request_id": "req_call_abc123_1_1706832000000"
}

Messages TO Burki (Your Server Sends)

Your server should send these JSON messages:
Confirm that pre-warming is complete and the session is ready.
{
  "type": "session_ready",
  "call_id": "call_abc123"
}
Stream LLM responses using OpenAI-compatible format. Each chunk must include the request_id.Text content chunk:
{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": { "content": "Hello! " },
    "finish_reason": null
  }]
}
Final chunk (end of response):
{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {},
    "finish_reason": "stop"
  }]
}
If your LLM decides to call a tool (like transfer_call, end_call, or send_dtmf):Tool call start:
{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {
      "tool_calls": [{
        "id": "tool_123",
        "function": {
          "name": "transfer_call",
          "arguments": ""
        }
      }]
    },
    "finish_reason": null
  }]
}
Tool call arguments (can be streamed in chunks):
{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {
      "tool_calls": [{
        "function": {
          "arguments": "{\"destination\": \"+1555"
        }
      }]
    },
    "finish_reason": null
  }]
}
Tool call complete:
{
  "request_id": "req_call_abc123_1_1706832000000",
  "choices": [{
    "delta": {},
    "finish_reason": "tool_calls"
  }]
}
Response to ping for keep-alive.
{
  "type": "pong"
}
If something goes wrong:
{
  "request_id": "req_call_abc123_1_1706832000000",
  "error": {
    "message": "Internal server error",
    "code": "internal_error"
  }
}

Available Built-in Tools

When Burki sends a transcript request, it may include tools that your LLM can call:
Tool NameDescriptionArguments
transfer_callTransfer to another numberdestination (phone number)
end_callEnd the current callNone
send_dtmfSend DTMF tonesdigits (string of digits)

The WebSocket connects to {ws_url}/{call_id} for each call, allowing your server to route and manage connections per-call. The init message is sent before any audio is processed, giving you time to warm up caches.
Custom WebSocket is for advanced users running their own inference infrastructure. Your server must implement the full message protocol described above. All responses must be valid JSON and include the request_id for transcript responses.

Fallback Chains

You can add up to 3 fallback LLM providers. If your main provider fails, Burki Voice AI will automatically try the next one—keeping your assistant reliable.

Troubleshooting

  • Invalid API key: Double-check your key and provider dashboard
  • Model not available: Make sure your account has access to the selected model
  • Timeouts or errors: Try a different provider or set up fallbacks
  • Custom endpoint issues: Ensure your API matches the OpenAI API spec