Cartesia Sonic 3 TTS - Burki Voice AI Docs

🎯 Cartesia Sonic 3: Multilingual Excellence

High-quality, low-latency TTS with support for 42 languages, voice cloning from ~5 second samples, and WebSocket streaming. Perfect for global deployments and multilingual voice agents.

Quick Setup

Get API Key

Visit Cartesia.ai and create an account
Navigate to your API settings
Copy your API key

Configure in Burki

Go to AI Configuration → TTS tab
Select Cartesia Sonic 3 as provider
Paste your API key in the TTS API Key field

Choose Voice & Model

Select your preferred model and voice from the dropdowns

Pricing: Cartesia charges approximately $0.015 per 1,000 characters. Check their website for current pricing details.

Available Models

⚡ sonic-3

Latest Model (Auto-updates)Always uses the newest Sonic 3 improvementsLanguages: 42 languages Best for: Most applications, staying current

📌 sonic-3-2025-10-27

Stable SnapshotFixed version for consistent behaviorLanguages: 42 languages Best for: Production stability, testing

Recommendation: Use sonic-3 for development and most use cases. Use the dated snapshot for production if you need guaranteed consistency.

Available Voices

Preset Voices

Female Voices

Katie

Stable & RealisticPerfect for voice agentsVoice ID: f786b574-daa5-4673-aa0c-cbe3e8534c02

Tessa

Emotive & ExpressiveGreat for engaging conversationsVoice ID: 6ccbfb76-1fc6-48f7-b71d-91ac6298247b

Sarah

Clear & ProfessionalIdeal for business applicationsVoice ID: a0e99841-438c-4a64-b679-ae501e7d6091

Male Voices

Kiefer

Stable & RealisticPerfect for voice agentsVoice ID: 228fca29-3a0a-435c-8728-5cb483251068

Kyle

Emotive & ExpressiveGreat for dynamic interactionsVoice ID: c961b81c-a935-4c17-bfb3-ba2239de8c2f

All Available Voices

Voice	Gender	Style	Voice ID	Best For
Katie	Female	Stable, Realistic	`f786b574-daa5-4673-aa0c-cbe3e8534c02`	Voice agents
Kiefer	Male	Stable, Realistic	`228fca29-3a0a-435c-8728-5cb483251068`	Voice agents
Tessa	Female	Emotive, Expressive	`6ccbfb76-1fc6-48f7-b71d-91ac6298247b`	Characters
Kyle	Male	Emotive, Expressive	`c961b81c-a935-4c17-bfb3-ba2239de8c2f`	Characters
Sarah	Female	Clear, Professional	`a0e99841-438c-4a64-b679-ae501e7d6091`	Business

Language Support

42 Languages Supported: Cartesia Sonic 3 offers extensive multilingual support, making it ideal for global deployments.

Supported Languages

🇺🇸 English

en

🇪🇸 Spanish

es

🇫🇷 French

fr

🇩🇪 German

de

🇮🇹 Italian

it

🇵🇹 Portuguese

pt

🇨🇳 Chinese

zh

🇯🇵 Japanese

ja

🇰🇷 Korean

ko

🇮🇳 Hindi

hi

🇸🇦 Arabic

ar

🇷🇺 Russian

ru

Additional Languages: Dutch (nl), Polish (pl), Swedish (sv), Turkish (tr), Tagalog (tl), Bulgarian (bg), Romanian (ro), Czech (cs), Greek (el), Finnish (fi), Croatian (hr), Malay (ms), Slovak (sk), Danish (da), Tamil (ta), Ukrainian (uk), Hungarian (hu), Norwegian (no), Vietnamese (vi), Bengali (bn), Thai (th), Hebrew (he), Georgian (ka), Indonesian (id), Telugu (te), Gujarati (gu), Kannada (kn), Malayalam (ml), Marathi (mr), Punjabi (pa)

Voice Cloning

Cartesia supports creating custom voices from audio samples.

Quick Cloning: Create a custom voice from just ~5 seconds of clean audio.

Requirements

Requirement	Value
Audio Duration	~5 seconds (recommended)
Formats	MP3, WAV, FLAC, M4A, OGG
Max File Size	10MB
Sample Rate	16kHz minimum recommended
Channels	Mono preferred
Quality	Clean audio without background noise

Clone a Voice

Prepare Audio Sample

Record or select ~5 seconds of clear speech from the target voice

Upload via UI

Go to Voice Configuration in assistant settings
Select Cartesia as provider
Click Clone Voice button
Upload your audio sample

Use Cloned Voice

The cloned voice appears in your voice dropdown and can be selected for your assistant

Voice Cloning Tips

Configuration Options

Context Continuation

Cartesia maintains natural prosody across sentences using context continuation:

# First sentence starts new context
message = {
    "context_id": "ctx_0",
    "continue": False,  # New context
    "transcript": "Hello, welcome to our service."
}

# Subsequent sentences continue the context
message = {
    "context_id": "ctx_0", 
    "continue": True,  # Continue same context
    "transcript": "How can I help you today?"
}

Flush Tags

Force immediate speech mid-sentence using the <flush/> tag:

# Text with flush tag
text = "Please hold<flush/> while I check your account."
# "Please hold" speaks immediately, then "while I check..." follows

Audio Format

Cartesia automatically outputs the appropriate format for your telephony provider:

Twilio/Telnyx: PCM μ-law @ 8kHz
Vonage: PCM 16-bit @ 16kHz

API Integration

from app.services.tts.tts_cartesia import CartesiaTTSService

# Create Cartesia TTS instance
tts = CartesiaTTSService(
    call_sid="unique_call_id",
    api_key="your_cartesia_api_key",
    voice_id="f786b574-daa5-4673-aa0c-cbe3e8534c02",  # Katie
    model_id="sonic-3",
    language="en"
)

# Start session
await tts.start_session(audio_callback=your_callback)

# Process text
await tts.process_text("Hello, this is a test.")

# End session
await tts.end_session()

Performance Comparison

Feature	Cartesia Sonic 3	ElevenLabs	Deepgram Aura
Latency	~150ms	~250ms	~75ms
Languages	42	70+	English + Spanish
Voice Quality	High	Premium	Good
Voice Cloning	Yes (~5 sec)	Yes (1-25 min)	No
Best For	Multilingual agents	Premium quality	Speed-focused

Common Issues & Solutions

Connection Issues

Problem: WebSocket connection failsSolutions:

Verify your CARTESIA_API_KEY is set correctly
Check network connectivity to api.cartesia.ai
Ensure API key has sufficient credits

Voice Not Found

Problem: Selected voice doesn’t workSolutions:

Verify you’re using a valid voice ID (UUID format)
If using a custom cloned voice, ensure it exists in your Cartesia account
Try one of the preset voices to test connectivity

Audio Quality Issues

Problem: Output sounds distorted or unclearSolutions:

Cartesia outputs 8kHz μ-law for Twilio compatibility
For higher quality, ensure you’re using the correct output format for your use case
Check that your telephony provider supports the audio format

⚡ Need Speed?

Deepgram Aura - Ultra-low ~75ms latency for real-time calls

🎭 Premium Quality?

ElevenLabs - Industry-leading voice quality with 70+ languages

🔗 Additional Resources

Official Documentation: Cartesia DocsVoice Cloning: Clone API ReferenceWebSocket API: TTS WebSocket Guide

🚀 Ready to Use Cartesia?

Head back to your assistant configuration and set up Cartesia Sonic 3 for multilingual voice experiences!

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

🎯 Cartesia Sonic 3: Multilingual Excellence

​Quick Setup

​Available Models

⚡ sonic-3

📌 sonic-3-2025-10-27

​Available Voices

​Preset Voices

Katie

Tessa

Sarah

Kiefer

Kyle

​Language Support

🇺🇸 English

🇪🇸 Spanish

🇫🇷 French

🇩🇪 German

🇮🇹 Italian

🇵🇹 Portuguese

🇨🇳 Chinese

🇯🇵 Japanese

🇰🇷 Korean

🇮🇳 Hindi

🇸🇦 Arabic

🇷🇺 Russian

​Voice Cloning

​Requirements

​Clone a Voice

​Voice Cloning Tips

​Configuration Options

​Context Continuation

​Flush Tags

​Audio Format

​API Integration

​Performance Comparison

​Common Issues & Solutions

​See Also

⚡ Need Speed?

🎭 Premium Quality?

🔗 Additional Resources

🚀 Ready to Use Cartesia?

Quick Setup

Available Models

Available Voices

Preset Voices

Language Support

Voice Cloning

Requirements

Clone a Voice

Voice Cloning Tips

Configuration Options

Context Continuation

Flush Tags

Audio Format

API Integration

Performance Comparison

Common Issues & Solutions

See Also