Skip to main content

๐ŸŽฏ Cartesia Sonic 3: Multilingual Excellence

High-quality, low-latency TTS with support for 42 languages, voice cloning from ~5 second samples, and WebSocket streaming. Perfect for global deployments and multilingual voice agents.

Quick Setup

1

Get API Key

  1. Visit Cartesia.ai and create an account
  2. Navigate to your API settings
  3. Copy your API key
2

Configure in Burki

  1. Go to AI Configuration โ†’ TTS tab
  2. Select Cartesia Sonic 3 as provider
  3. Paste your API key in the TTS API Key field
3

Choose Voice & Model

Select your preferred model and voice from the dropdowns
Pricing: Cartesia charges approximately $0.015 per 1,000 characters. Check their website for current pricing details.

Available Models

โšก sonic-3

Latest Model (Auto-updates)Always uses the newest Sonic 3 improvementsLanguages: 42 languages Best for: Most applications, staying current

๐Ÿ“Œ sonic-3-2025-10-27

Stable SnapshotFixed version for consistent behaviorLanguages: 42 languages Best for: Production stability, testing
Recommendation: Use sonic-3 for development and most use cases. Use the dated snapshot for production if you need guaranteed consistency.

Available Voices

Preset Voices

Katie

Stable & RealisticPerfect for voice agentsVoice ID: f786b574-daa5-4673-aa0c-cbe3e8534c02

Tessa

Emotive & ExpressiveGreat for engaging conversationsVoice ID: 6ccbfb76-1fc6-48f7-b71d-91ac6298247b

Sarah

Clear & ProfessionalIdeal for business applicationsVoice ID: a0e99841-438c-4a64-b679-ae501e7d6091

Kiefer

Stable & RealisticPerfect for voice agentsVoice ID: 228fca29-3a0a-435c-8728-5cb483251068

Kyle

Emotive & ExpressiveGreat for dynamic interactionsVoice ID: c961b81c-a935-4c17-bfb3-ba2239de8c2f
VoiceGenderStyleVoice IDBest For
KatieFemaleStable, Realisticf786b574-daa5-4673-aa0c-cbe3e8534c02Voice agents
KieferMaleStable, Realistic228fca29-3a0a-435c-8728-5cb483251068Voice agents
TessaFemaleEmotive, Expressive6ccbfb76-1fc6-48f7-b71d-91ac6298247bCharacters
KyleMaleEmotive, Expressivec961b81c-a935-4c17-bfb3-ba2239de8c2fCharacters
SarahFemaleClear, Professionala0e99841-438c-4a64-b679-ae501e7d6091Business

Language Support

42 Languages Supported: Cartesia Sonic 3 offers extensive multilingual support, making it ideal for global deployments.

๐Ÿ‡บ๐Ÿ‡ธ English

en

๐Ÿ‡ช๐Ÿ‡ธ Spanish

es

๐Ÿ‡ซ๐Ÿ‡ท French

fr

๐Ÿ‡ฉ๐Ÿ‡ช German

de

๐Ÿ‡ฎ๐Ÿ‡น Italian

it

๐Ÿ‡ต๐Ÿ‡น Portuguese

pt

๐Ÿ‡จ๐Ÿ‡ณ Chinese

zh

๐Ÿ‡ฏ๐Ÿ‡ต Japanese

ja

๐Ÿ‡ฐ๐Ÿ‡ท Korean

ko

๐Ÿ‡ฎ๐Ÿ‡ณ Hindi

hi

๐Ÿ‡ธ๐Ÿ‡ฆ Arabic

ar

๐Ÿ‡ท๐Ÿ‡บ Russian

ru
Additional Languages: Dutch (nl), Polish (pl), Swedish (sv), Turkish (tr), Tagalog (tl), Bulgarian (bg), Romanian (ro), Czech (cs), Greek (el), Finnish (fi), Croatian (hr), Malay (ms), Slovak (sk), Danish (da), Tamil (ta), Ukrainian (uk), Hungarian (hu), Norwegian (no), Vietnamese (vi), Bengali (bn), Thai (th), Hebrew (he), Georgian (ka), Indonesian (id), Telugu (te), Gujarati (gu), Kannada (kn), Malayalam (ml), Marathi (mr), Punjabi (pa)

Voice Cloning

Cartesia supports creating custom voices from audio samples.
Quick Cloning: Create a custom voice from just ~5 seconds of clean audio.

Requirements

RequirementValue
Audio Duration~5 seconds (recommended)
FormatsMP3, WAV, FLAC, M4A, OGG
Max File Size10MB
Sample Rate16kHz minimum recommended
ChannelsMono preferred
QualityClean audio without background noise

Clone a Voice

1

Prepare Audio Sample

Record or select ~5 seconds of clear speech from the target voice
2

Upload via UI

  1. Go to Voice Configuration in assistant settings
  2. Select Cartesia as provider
  3. Click Clone Voice button
  4. Upload your audio sample
3

Use Cloned Voice

The cloned voice appears in your voice dropdown and can be selected for your assistant

Voice Cloning Tips

Configuration Options

Context Continuation

Cartesia maintains natural prosody across sentences using context continuation:
# First sentence starts new context
message = {
    "context_id": "ctx_0",
    "continue": False,  # New context
    "transcript": "Hello, welcome to our service."
}

# Subsequent sentences continue the context
message = {
    "context_id": "ctx_0", 
    "continue": True,  # Continue same context
    "transcript": "How can I help you today?"
}

Flush Tags

Force immediate speech mid-sentence using the <flush/> tag:
# Text with flush tag
text = "Please hold<flush/> while I check your account."
# "Please hold" speaks immediately, then "while I check..." follows

Audio Format

Cartesia automatically outputs the appropriate format for your telephony provider:
  • Twilio/Telnyx: PCM ฮผ-law @ 8kHz
  • Vonage: PCM 16-bit @ 16kHz

API Integration

from app.services.tts.tts_cartesia import CartesiaTTSService

# Create Cartesia TTS instance
tts = CartesiaTTSService(
    call_sid="unique_call_id",
    api_key="your_cartesia_api_key",
    voice_id="f786b574-daa5-4673-aa0c-cbe3e8534c02",  # Katie
    model_id="sonic-3",
    language="en"
)

# Start session
await tts.start_session(audio_callback=your_callback)

# Process text
await tts.process_text("Hello, this is a test.")

# End session
await tts.end_session()

Performance Comparison

FeatureCartesia Sonic 3ElevenLabsDeepgram Aura
Latency~150ms~250ms~75ms
Languages4270+English + Spanish
Voice QualityHighPremiumGood
Voice CloningYes (~5 sec)Yes (1-25 min)No
Best ForMultilingual agentsPremium qualitySpeed-focused

Common Issues & Solutions

Problem: WebSocket connection failsSolutions:
  • Verify your CARTESIA_API_KEY is set correctly
  • Check network connectivity to api.cartesia.ai
  • Ensure API key has sufficient credits
Problem: Selected voice doesnโ€™t workSolutions:
  • Verify youโ€™re using a valid voice ID (UUID format)
  • If using a custom cloned voice, ensure it exists in your Cartesia account
  • Try one of the preset voices to test connectivity
Problem: Output sounds distorted or unclearSolutions:
  • Cartesia outputs 8kHz ฮผ-law for Twilio compatibility
  • For higher quality, ensure youโ€™re using the correct output format for your use case
  • Check that your telephony provider supports the audio format

See Also

โšก Need Speed?

Deepgram Aura - Ultra-low ~75ms latency for real-time calls

๐ŸŽญ Premium Quality?

ElevenLabs - Industry-leading voice quality with 70+ languages

๐Ÿ”— Additional Resources

Official Documentation: Cartesia DocsVoice Cloning: Clone API ReferenceWebSocket API: TTS WebSocket Guide

๐Ÿš€ Ready to Use Cartesia?

Head back to your assistant configuration and set up Cartesia Sonic 3 for multilingual voice experiences!