Skip to main content

🔧 TTS Troubleshooting Guide

Solve common TTS issues quickly with provider-specific solutions and general troubleshooting strategies.

Quick Diagnosis

🩺 Identify Your Issue

Start here to quickly identify the type of problem you’re experiencing.
Symptoms: TTS request completes but no audio is producedQuick Checks:
  • ✅ API key is valid and has TTS permissions
  • ✅ Voice ID exists and is spelled correctly
  • ✅ Audio format is supported by your system
  • ✅ Network connectivity is stable
Jump to: No Audio Output

No Audio Output

Common Causes & Solutions:Voice ID Issues:
# Check if voice exists
curl -X GET "https://api.elevenlabs.io/v1/voices" \
  -H "xi-api-key: YOUR_API_KEY"
  • Verify voice ID is correct (case-sensitive)
  • Ensure voice is available on your plan
  • Try with default voice: 21m00Tcm4TlvDq8ikWAM (Rachel)
Model Compatibility:API Key Issues:
  • Verify API key has TTS permissions
  • Check key isn’t expired or revoked
  • Test with a simple curl request first
Common Causes & Solutions:WebSocket Connection:
// Test WebSocket connection
const ws = new WebSocket(
  'wss://api.deepgram.com/v1/speak?model=aura-asteria-en',
  { headers: { 'Authorization': 'Token YOUR_API_KEY' }}
);

ws.on('error', (error) => {
  console.log('Connection failed:', error);
});
Audio Format Issues:
  • Ensure your system supports the requested format
  • Try µ-law for phone systems: encoding=mulaw&sample_rate=8000
  • Use linear16 for web: encoding=linear16&sample_rate=24000
Voice Model Issues:
  • Use correct voice format: aura-asteria-en not asteria
  • Verify model exists: aura-2 vs aura
  • Check Deepgram voice list
Common Causes & Solutions:Bearer Token:
# Test authentication
curl -X GET "https://api.inworld.ai/v1/voices" \
  -H "Authorization: Bearer YOUR_TOKEN"
Language/Voice Compatibility:
  • Verify voice supports selected language
  • Check language code format: en not english
  • Use language matrix
Model Selection:
  • Try inworld-tts-1 before inworld-tts-1-max
  • Ensure model supports your voice
  • Check model compatibility
Common Causes & Solutions:WebSocket Requirements:
  • Business plan required for WebSocket streaming
  • Check plan status in Resemble dashboard
  • Fallback to REST API if needed
UUID Format:
{
  "project_uuid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "voice_uuid": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
}
Voice Training Status:
  • Ensure custom voice training is complete
  • Check voice status in Resemble dashboard
  • Wait for training completion before using

Audio Quality Issues

ElevenLabs Solutions:
{
  "stability": 0.5,           // Try 0.4-0.6 range
  "similarity_boost": 0.75,   // Optimal setting
  "style": 0.0,              // Keep low for natural sound
  "use_speaker_boost": true   // Always enable
}
Inworld Solutions:
  • Reduce emotional markup intensity
  • Try different voice with your content
  • Switch from TTS-1-Max to TTS-1 for stability
Deepgram Solutions:
  • Use Aura-2 instead of original Aura
  • Ensure proper audio encoding for your system
  • Check sample rate matches playback system
General Solutions:
  • Test with shorter text samples
  • Remove special characters from input text
  • Verify network stability during generation
Text Preprocessing:
def fix_pronunciations(text):
    fixes = {
        "API": "A P I",
        "HTTP": "H T T P", 
        "OAuth": "O Auth",
        "UUID": "U U I D",
        "AWS": "A W S",
        "URL": "U R L"
    }
    
    for term, pronunciation in fixes.items():
        text = text.replace(term, pronunciation)
    return text
Provider-Specific:
  • ElevenLabs: Use SSML for pronunciation control
  • Inworld: Leverage phonetic variations in training
  • Deepgram: English-optimized, fewer pronunciation issues
  • Resemble: Train custom voice with problematic words
Stability Optimization:
  • ElevenLabs: Increase stability to 0.6-0.7
  • Inworld: Use TTS-1 instead of TTS-1-Max
  • Resemble: Retrain voice with more consistent samples
Network Optimization:
  • Use WebSocket connections for streaming providers
  • Implement connection keepalive
  • Add retry logic for failed chunks
  • Monitor network latency and jitter

Latency Problems

⚡ Speed Optimization

Optimize TTS response times across all providers.
Model Selection:
{
  "model": "eleven_flash_v2_5",  // Fastest model
  "latency": 1,                  // Optimize setting 
  "stability": 0.5,              // Don't go too low
  "use_speaker_boost": true      // Maintain quality
}
Best Practices:
  • Use Flash v2.5 for phone calls (~75ms)
  • Keep text chunks under 100 characters
  • Avoid complex punctuation and formatting
  • Use WebSocket streaming for real-time apps
Optimal Configuration:
{
  "model": "aura-2-asteria-en",
  "encoding": "mulaw",
  "sample_rate": 8000
}
Speed Tips:
  • Already fastest provider (~75ms)
  • Use µ-law encoding for phone systems
  • Keep WebSocket connections alive
  • Send text in 20-50 word chunks
Text Optimization:
def optimize_text_for_speed(text):
    # Remove unnecessary punctuation
    text = re.sub(r'[\.]{2,}', '.', text)
    
    # Break into optimal chunks
    chunks = split_into_chunks(text, max_words=30)
    
    # Remove extra whitespace
    chunks = [chunk.strip() for chunk in chunks]
    
    return chunks
Connection Optimization:
  • Reuse connections where possible
  • Implement connection pooling
  • Use regional endpoints when available
  • Monitor and retry failed requests quickly

API and Authentication Errors

Common Causes:
  • Expired or invalid API key
  • Incorrect authentication header format
  • Key doesn’t have required permissions
Solutions by Provider:ElevenLabs:
# Correct header format
curl -H "xi-api-key: YOUR_API_KEY"
Deepgram:
# Correct header format  
curl -H "Authorization: Token YOUR_API_KEY"
Inworld:
# Correct header format
curl -H "Authorization: Bearer YOUR_TOKEN"
Resemble:
# Correct header format
curl -H "Authorization: Bearer YOUR_API_KEY"
Common Causes:
  • Plan limitations (voice access, features)
  • Usage quota exceeded
  • Geographic restrictions
Solutions:
  • Check plan features and upgrade if needed
  • Verify voice is available on your plan
  • Review usage dashboard for quota limits
  • Contact provider support for restrictions
Rate Limit Solutions:
import time
import random

def retry_with_backoff(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            
            # Exponential backoff with jitter
            delay = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(delay)
Prevention:
  • Implement proper rate limiting in your code
  • Use connection pooling and queuing
  • Distribute requests across time
  • Consider upgrading to higher tier plans

Provider-Specific Issues

Training Issues:
  • Upload 1-25 minutes of clear audio
  • Use consistent speaker and environment
  • Include diverse sentence types
  • Wait for full training completion
Usage Issues:
  • Use correct voice ID from dashboard
  • Ensure plan supports voice cloning
  • Try different similarity_boost values
  • Check voice model compatibility
Language Detection:
  • Explicitly set language parameter
  • Use models that support target language
  • Test with native speakers
  • Avoid mixing languages in single request
Model Compatibility:
{
  "model": "eleven_v3",        // Best multilingual
  "language": "es",            // Explicit language
  "text": "Hola mundo"
}

Emergency Troubleshooting

🚨 When Everything Breaks

Quick recovery strategies for critical TTS failures.

Fallback Strategy Implementation

class TTSWithFallback:
    def __init__(self):
        self.providers = [
            {"name": "elevenlabs", "priority": 1},
            {"name": "deepgram", "priority": 2},
            {"name": "inworld", "priority": 3}
        ]
    
    async def synthesize_with_fallback(self, text):
        for provider in sorted(self.providers, key=lambda x: x["priority"]):
            try:
                return await self.try_provider(provider["name"], text)
            except Exception as e:
                logger.warning(f"{provider['name']} failed: {e}")
                continue
        
        # All providers failed - use local fallback
        return await self.local_tts_fallback(text)

Health Check Implementation

async def check_provider_health():
    """Monitor TTS provider health and switch if needed"""
    health_status = {}
    
    for provider in ["elevenlabs", "deepgram", "inworld", "resemble"]:
        try:
            start_time = time.time()
            await test_provider_connection(provider)
            latency = time.time() - start_time
            
            health_status[provider] = {
                "status": "healthy",
                "latency": latency,
                "last_check": time.time()
            }
        except Exception as e:
            health_status[provider] = {
                "status": "unhealthy", 
                "error": str(e),
                "last_check": time.time()
            }
    
    return health_status

Getting Help

📞 Support Resources

When you need additional help beyond this troubleshooting guide.

Provider Support

Community Resources

Community Help:

💡 Still Having Issues?

If this guide didn’t solve your problem, check our Best Practices guide or reach out to our community for help!