๐ฏ Cartesia Sonic 3: Multilingual Excellence
High-quality, low-latency TTS with support for 42 languages, voice cloning from ~5 second samples, and WebSocket streaming. Perfect for global deployments and multilingual voice agents.
Quick Setup
Get API Key
- Visit Cartesia.ai and create an account
- Navigate to your API settings
- Copy your API key
Configure in Burki
- Go to AI Configuration โ TTS tab
- Select Cartesia Sonic 3 as provider
- Paste your API key in the TTS API Key field
Pricing: Cartesia charges approximately $0.015 per 1,000 characters. Check their website for current pricing details.
Available Models
โก sonic-3
Latest Model (Auto-updates)Always uses the newest Sonic 3 improvementsLanguages: 42 languages
Best for: Most applications, staying current
๐ sonic-3-2025-10-27
Stable SnapshotFixed version for consistent behaviorLanguages: 42 languages
Best for: Production stability, testing
Recommendation: Use sonic-3 for development and most use cases. Use the dated snapshot for production if you need guaranteed consistency.
Available Voices
Preset Voices
Female Voices
Female Voices
Katie
Stable & RealisticPerfect for voice agents
Voice ID: f786b574-daa5-4673-aa0c-cbe3e8534c02Tessa
Emotive & ExpressiveGreat for engaging conversations
Voice ID: 6ccbfb76-1fc6-48f7-b71d-91ac6298247bSarah
Clear & ProfessionalIdeal for business applications
Voice ID: a0e99841-438c-4a64-b679-ae501e7d6091Male Voices
Male Voices
Kiefer
Stable & RealisticPerfect for voice agents
Voice ID: 228fca29-3a0a-435c-8728-5cb483251068Kyle
Emotive & ExpressiveGreat for dynamic interactions
Voice ID: c961b81c-a935-4c17-bfb3-ba2239de8c2fAll Available Voices
All Available Voices
| Voice | Gender | Style | Voice ID | Best For |
|---|---|---|---|---|
| Katie | Female | Stable, Realistic | f786b574-daa5-4673-aa0c-cbe3e8534c02 | Voice agents |
| Kiefer | Male | Stable, Realistic | 228fca29-3a0a-435c-8728-5cb483251068 | Voice agents |
| Tessa | Female | Emotive, Expressive | 6ccbfb76-1fc6-48f7-b71d-91ac6298247b | Characters |
| Kyle | Male | Emotive, Expressive | c961b81c-a935-4c17-bfb3-ba2239de8c2f | Characters |
| Sarah | Female | Clear, Professional | a0e99841-438c-4a64-b679-ae501e7d6091 | Business |
Language Support
42 Languages Supported: Cartesia Sonic 3 offers extensive multilingual support, making it ideal for global deployments.
Supported Languages
Supported Languages
๐บ๐ธ English
en๐ช๐ธ Spanish
es๐ซ๐ท French
fr๐ฉ๐ช German
de๐ฎ๐น Italian
it๐ต๐น Portuguese
pt๐จ๐ณ Chinese
zh๐ฏ๐ต Japanese
ja๐ฐ๐ท Korean
ko๐ฎ๐ณ Hindi
hi๐ธ๐ฆ Arabic
ar๐ท๐บ Russian
runl), Polish (pl), Swedish (sv), Turkish (tr), Tagalog (tl), Bulgarian (bg), Romanian (ro), Czech (cs), Greek (el), Finnish (fi), Croatian (hr), Malay (ms), Slovak (sk), Danish (da), Tamil (ta), Ukrainian (uk), Hungarian (hu), Norwegian (no), Vietnamese (vi), Bengali (bn), Thai (th), Hebrew (he), Georgian (ka), Indonesian (id), Telugu (te), Gujarati (gu), Kannada (kn), Malayalam (ml), Marathi (mr), Punjabi (pa)Voice Cloning
Cartesia supports creating custom voices from audio samples.Quick Cloning: Create a custom voice from just ~5 seconds of clean audio.
Requirements
| Requirement | Value |
|---|---|
| Audio Duration | ~5 seconds (recommended) |
| Formats | MP3, WAV, FLAC, M4A, OGG |
| Max File Size | 10MB |
| Sample Rate | 16kHz minimum recommended |
| Channels | Mono preferred |
| Quality | Clean audio without background noise |
Clone a Voice
Upload via UI
- Go to Voice Configuration in assistant settings
- Select Cartesia as provider
- Click Clone Voice button
- Upload your audio sample
Voice Cloning Tips
Configuration Options
Context Continuation
Cartesia maintains natural prosody across sentences using context continuation:Flush Tags
Force immediate speech mid-sentence using the<flush/> tag:
Audio Format
Cartesia automatically outputs the appropriate format for your telephony provider:- Twilio/Telnyx: PCM ฮผ-law @ 8kHz
- Vonage: PCM 16-bit @ 16kHz
API Integration
Performance Comparison
| Feature | Cartesia Sonic 3 | ElevenLabs | Deepgram Aura |
|---|---|---|---|
| Latency | ~150ms | ~250ms | ~75ms |
| Languages | 42 | 70+ | English + Spanish |
| Voice Quality | High | Premium | Good |
| Voice Cloning | Yes (~5 sec) | Yes (1-25 min) | No |
| Best For | Multilingual agents | Premium quality | Speed-focused |
Common Issues & Solutions
Connection Issues
Connection Issues
Problem: WebSocket connection failsSolutions:
- Verify your
CARTESIA_API_KEYis set correctly - Check network connectivity to api.cartesia.ai
- Ensure API key has sufficient credits
Voice Not Found
Voice Not Found
Problem: Selected voice doesnโt workSolutions:
- Verify youโre using a valid voice ID (UUID format)
- If using a custom cloned voice, ensure it exists in your Cartesia account
- Try one of the preset voices to test connectivity
Audio Quality Issues
Audio Quality Issues
Problem: Output sounds distorted or unclearSolutions:
- Cartesia outputs 8kHz ฮผ-law for Twilio compatibility
- For higher quality, ensure youโre using the correct output format for your use case
- Check that your telephony provider supports the audio format
See Also
โก Need Speed?
Deepgram Aura - Ultra-low ~75ms latency for real-time calls
๐ญ Premium Quality?
ElevenLabs - Industry-leading voice quality with 70+ languages
๐ Additional Resources
Official Documentation: Cartesia DocsVoice Cloning: Clone API ReferenceWebSocket API: TTS WebSocket Guide
๐ Ready to Use Cartesia?
Head back to your assistant configuration and set up Cartesia Sonic 3 for multilingual voice experiences!