Burki Voice AI lets you choose from the best language models (LLMs) on the market—or even use your own custom endpoint. Mix and match for reliability and performance!
Provider Comparison
| Provider | Example Models | Speed | Quality | Cost | Notes |
|---|---|---|---|---|---|
| OpenAI | GPT-4o, GPT-3.5-turbo | Fast | Excellent | $$ | Most popular, reliable |
| Anthropic | Claude 3.5, Haiku | Fast | Advanced | $$ | Safety-focused |
| Gemini | Gemini 2.0 Flash | Fast | Multimodal | $$ | Google’s latest |
| xAI | Grok Beta, Grok 2 | Very Fast | Good | $ | Real-time, less strict |
| Groq | Llama 3.3, 3.1 | Ultra Fast | Good | $ | Speed-focused |
| Custom | Any | ? | ? | ? | Bring your own (HTTP) |
| Custom WebSocket | Any | Ultra Fast | ? | ? | Lowest latency option |
How to Set Up Each Provider
OpenAI (GPT)
OpenAI (GPT)
- Sign up for OpenAI
- Go to API Keys and create a new key
- Paste your key in the LLM API Key field
- Choose a model (GPT-4o, GPT-3.5-turbo, etc.)
- (Optional) Set a custom base URL for Azure/OpenAI-compatible endpoints
GPT-4o is best for complex conversations. GPT-3.5-turbo is cheaper for simple tasks.
Anthropic (Claude)
Anthropic (Claude)
- Sign up for Anthropic
- Create an API key
- Paste your key in the LLM API Key field
- Choose a model (Claude 3.5 Sonnet, Haiku, etc.)
Claude models are great for advanced reasoning and safety-critical use cases.
Google Gemini
Google Gemini
- Sign up for Google AI Studio
- Get your API key
- Paste your key in the LLM API Key field
- Choose a model (Gemini 2.0 Flash, 1.5 Pro, etc.)
- Set the base URL (auto-filled for Gemini)
Gemini is multimodal—great for tasks involving text, images, and more.
xAI (Grok)
xAI (Grok)
- Sign up for xAI
- Get your API key
- Paste your key in the LLM API Key field
- Choose a model (Grok Beta, Grok 2, etc.)
- Set the base URL (auto-filled for xAI)
Grok is fast and less restrictive—good for real-time or experimental use.
Groq (Llama)
Groq (Llama)
- Sign up for Groq
- Get your API key
- Paste your key in the LLM API Key field
- Choose a model (Llama 3.3 70B, 3.1 8B, etc.)
Groq is all about speed—perfect for high-volume or latency-sensitive apps.
Custom Endpoint (HTTP)
Custom Endpoint (HTTP)
- Enter your custom API endpoint URL
- (Optional) Add an API key if required
- Specify the model name
Custom endpoints are for advanced users. Make sure your API is OpenAI-compatible!
Custom WebSocket (Lowest Latency)
Custom WebSocket (Lowest Latency)
Use a persistent WebSocket connection to your own LLM endpoint for the lowest possible latency. This is ideal for self-hosted models or custom inference servers.Key Benefits:
- Persistent Connection: Maintains connection throughout the entire call—no connection overhead per request
- Pre-warming: Sends initialization data before the first transcript, allowing your server to pre-load context
- Automatic Reconnection: Exponential backoff with configurable retry attempts
- HTTP Fallback: Falls back to HTTP if WebSocket disconnects
- Enter your WebSocket URL (e.g.,
wss://api.example.com/ws/completions) - (Optional) Add a fallback HTTP URL for reliability
- (Optional) Add an API key if required
WebSocket Protocol Specification
Connection
Burki connects to{ws_url}/{call_id} for each call. For example, if your ws_url is wss://api.example.com/ws/completions and the call ID is call_abc123, the connection URL will be:Messages FROM Burki (Your Server Receives)
Your server will receive these JSON messages:1. Init Message (Pre-warming)
1. Init Message (Pre-warming)
Sent immediately after connection, before any transcripts. Use this to pre-load context, do DB lookups, and prepare for fast responses.
Your server should respond with:
| Field | Description |
|---|---|
type | Always "init" |
call_id | Unique identifier for this call |
phoneNumber.number | The phone number being called (your assistant’s number) |
call.customer.number | The caller’s phone number |
call.phoneCallProviderId | Provider-specific call ID |
2. Transcript Message (LLM Request)
2. Transcript Message (LLM Request)
Sent when the caller speaks. This is the main request for LLM completion.
| Field | Description |
|---|---|
type | Always "transcript" |
request_id | Unique ID for this request (use in responses) |
messages | OpenAI-compatible message array |
tools | Optional array of available function tools |
3. Ping Message (Keep-alive)
3. Ping Message (Keep-alive)
Sent periodically to keep the connection alive.Your server should respond with:
4. Cancel Message
4. Cancel Message
Sent when the user interrupts or the request should be cancelled.Your server should:
- Stop generating for this
request_id - Optionally respond with:
Messages TO Burki (Your Server Sends)
Your server should send these JSON messages:1. Session Ready
1. Session Ready
Confirm that pre-warming is complete and the session is ready.
2. Streaming Response Chunks (OpenAI-compatible)
2. Streaming Response Chunks (OpenAI-compatible)
Stream LLM responses using OpenAI-compatible format. Each chunk must include the Final chunk (end of response):
request_id.Text content chunk:3. Tool Call Response
3. Tool Call Response
If your LLM decides to call a tool (like transfer_call, end_call, or send_dtmf):Tool call start:Tool call arguments (can be streamed in chunks):Tool call complete:
4. Pong Response
4. Pong Response
Response to ping for keep-alive.
5. Error Response
5. Error Response
If something goes wrong:
Available Built-in Tools
When Burki sends a transcript request, it may include tools that your LLM can call:| Tool Name | Description | Arguments |
|---|---|---|
transfer_call | Transfer to another number | destination (phone number) |
end_call | End the current call | None |
send_dtmf | Send DTMF tones | digits (string of digits) |
The WebSocket connects to
{ws_url}/{call_id} for each call, allowing your server to route and manage connections per-call. The init message is sent before any audio is processed, giving you time to warm up caches.Custom WebSocket is for advanced users running their own inference infrastructure. Your server must implement the full message protocol described above. All responses must be valid JSON and include the
request_id for transcript responses.Fallback Chains
You can add up to 3 fallback LLM providers. If your main provider fails, Burki Voice AI will automatically try the next one—keeping your assistant reliable.
Troubleshooting
Common Issues
Common Issues
- Invalid API key: Double-check your key and provider dashboard
- Model not available: Make sure your account has access to the selected model
- Timeouts or errors: Try a different provider or set up fallbacks
- Custom endpoint issues: Ensure your API matches the OpenAI API spec