Resemble AI TTS

🎯 Resemble AI: Custom Brand Voices

Create unlimited custom voices with your brand’s unique sound. WebSocket streaming ensures real-time responses for professional applications.

Quick Setup

Get API Credentials

Visit Resemble AI and create an account
Navigate to Settings → API Keys
Generate an API key with TTS permissions
Copy your API Key and Project UUID

Create Custom Voice

Go to Voices in your Resemble dashboard
Click Create Voice and upload voice samples
Wait for training completion (~30 minutes)
Copy the generated Voice UUID

Configure in Burki

Go to AI Configuration → TTS tab
Select Resemble AI as provider
Enter your API Key, Project UUID, and Voice UUID

Business Plan Required: WebSocket streaming (required for real-time TTS) is only available on Business plans ($99+/month).

Voice Creation Process

🎙️ Build Your Brand Voice

Resemble AI specializes in creating custom voices that match your brand personality and requirements.

Voice Training Steps

Voice Samples
Training Process
Voice Quality Tips

Upload Requirements:

Duration: 3-10 minutes of clean audio
Format: WAV or MP3, 22kHz+ sample rate
Content: Read diverse sentences for best results
Quality: Clear speech, minimal background noise

Example training script:
"Hello, my name is [Speaker Name]. I work at [Company Name] 
as a customer service representative. Today I'll be reading 
various sentences to help train an AI voice model. 
The weather is beautiful today with clear blue skies. 
Our company provides excellent customer support services."

Available Models

🔧 Synthesis Models

Resemble AI focuses on custom voice synthesis rather than multiple models.

Default Synthesis Model

~300ms latencyHigh-quality neural synthesis optimized for custom voicesFeatures:

Custom voice support
WebSocket streaming
Phone-compatible formats
Twilio integration ready

Best for: Brand-specific applications, personalized experiences

Model Focus: Unlike other providers, Resemble specializes in voice quality and customization rather than offering multiple model options.

WebSocket Streaming

⚡ Real-Time Streaming

WebSocket streaming enables real-time TTS for live applications like phone calls and interactive experiences.

Streaming Setup

import asyncio
import websockets
import json
import base64

async def stream_resemble_tts():
    uri = "wss://websocket.cluster.resemble.ai/stream"
    
    headers = {
        "Authorization": "Bearer YOUR_RESEMBLE_API_KEY"
    }
    
    async with websockets.connect(uri, extra_headers=headers) as websocket:
        # Initialize streaming session
        init_message = {
            "type": "initialize",
            "project_uuid": "YOUR_PROJECT_UUID",
            "voice_uuid": "YOUR_VOICE_UUID",
            "sample_rate": 8000,
            "precision": "MULAW",
            "output_format": "wav"
        }
        
        await websocket.send(json.dumps(init_message))
        
        # Send text for synthesis
        text_message = {
            "type": "text",
            "text": "Hello from Resemble AI streaming!",
            "request_id": "unique_request_id_123"
        }
        
        await websocket.send(json.dumps(text_message))
        
        # Receive audio chunks
        async for message in websocket:
            data = json.loads(message)
            
            if data.get("type") == "audio":
                audio_content = data.get("audio_content")
                if audio_content:
                    audio_data = base64.b64decode(audio_content)
                    # Process audio chunk
                    yield audio_data

Audio Format Configuration

Phone Calls (Recommended)
High Quality
Balanced

{
  "sample_rate": 8000,
  "precision": "MULAW",
  "output_format": "wav"
}

Use Case: Twilio, phone systems, VoIP Quality: Optimized for voice clarity over networks Compatibility: Universal phone system support

{
  "sample_rate": 22050,
  "precision": "PCM_16",
  "output_format": "wav"
}

Use Case: Web applications, content creation Quality: High-fidelity audio for premium experiences File Size: Larger files, better for non-real-time use

{
  "sample_rate": 16000,
  "precision": "PCM_16",
  "output_format": "wav"
}

Use Case: General applications, chatbots Quality: Good balance of quality and performance Compatibility: Works well for most use cases

Custom Voice Management

🎛️ Voice Library Management

Organize and manage your custom voices for different use cases and brand requirements.

Voice Categories

Brand Representative Voices

Use Case: Customer service, sales, brand communicationCharacteristics:

Professional and approachable tone
Consistent with brand personality
Clear pronunciation and pacing
Suitable for extended conversations

Training Tips:

Use your actual customer service representatives
Record in professional setting
Include common business phrases and terminology
Test with actual customer scripts

Character Voices

Use Case: Gaming, entertainment, interactive mediaCharacteristics:

Distinctive personality traits
Appropriate for character backstory
Emotionally expressive range
Memorable and engaging

Training Tips:

Work with voice actors who understand the character
Include emotional range in training samples
Record character-appropriate content
Test with actual dialogue scripts

Narrator Voices

Use Case: E-learning, audiobooks, documentationCharacteristics:

Clear and educational tone
Good pacing for comprehension
Neutral but engaging delivery
Suitable for long-form content

Training Tips:

Use experienced narrators or educators
Include varied sentence structures
Practice with actual educational content
Focus on clarity and comprehension

Voice UUID Management

# Example voice management system
VOICE_LIBRARY = {
    "customer_service_female": "uuid-1234-5678-abcd",
    "customer_service_male": "uuid-2345-6789-bcde",
    "ceo_announcements": "uuid-3456-7890-cdef",
    "technical_support": "uuid-4567-8901-defa",
    "marketing_spokesperson": "uuid-5678-9012-efab"
}

def get_voice_for_context(context_type):
    """Select appropriate voice based on interaction context"""
    voice_mapping = {
        "support": VOICE_LIBRARY["customer_service_female"],
        "sales": VOICE_LIBRARY["marketing_spokesperson"],
        "technical": VOICE_LIBRARY["technical_support"],
        "announcements": VOICE_LIBRARY["ceo_announcements"]
    }
    
    return voice_mapping.get(context_type, VOICE_LIBRARY["customer_service_female"])

Integration Examples

Customer Service Bot
Brand Spokesperson
Multi-Voice Application

import asyncio
from resemble_ai import ResembleStreaming

class CustomerServiceTTS:
    def __init__(self):
        self.voice_uuid = "customer-service-voice-uuid"
        self.project_uuid = "your-project-uuid"
        
    async def handle_customer_inquiry(self, customer_message, inquiry_type):
        # Select appropriate voice based on inquiry type
        if inquiry_type == "complaint":
            response_tone = "empathetic"
            text = f"I understand your concern and I'm here to help resolve this issue for you."
        elif inquiry_type == "sales":
            response_tone = "enthusiastic"
            text = f"I'd be happy to tell you more about that product!"
        else:
            response_tone = "professional"
            text = f"Thank you for contacting us. How may I assist you today?"
        
        # Stream TTS response
        async for audio_chunk in self.stream_response(text):
            yield audio_chunk
            
    async def stream_response(self, text):
        # Implementation details for streaming
        pass

class BrandSpokesperson:
    def __init__(self, brand_voice_uuid):
        self.brand_voice = brand_voice_uuid
        self.brand_phrases = {
            "greeting": "Welcome to [Brand Name], where innovation meets excellence.",
            "closing": "Thank you for choosing [Brand Name]. We look forward to serving you.",
            "value_prop": "At [Brand Name], we believe in delivering exceptional value to every customer."
        }
        
    async def deliver_message(self, message_type, custom_content=None):
        if message_type in self.brand_phrases:
            text = self.brand_phrases[message_type]
        else:
            text = custom_content
            
        # Ensure brand consistency
        text = self.apply_brand_tone(text)
        
        return await self.synthesize_with_brand_voice(text)
        
    def apply_brand_tone(self, text):
        # Add brand-specific modifications
        # (tone adjustments, terminology, etc.)
        return text

class MultiVoiceSystem:
    def __init__(self):
        self.voices = {
            "announcer": "announcer-voice-uuid",
            "customer_service": "cs-voice-uuid",
            "technical_expert": "tech-voice-uuid",
            "sales_rep": "sales-voice-uuid"
        }
        
    async def route_to_appropriate_voice(self, message, context):
        # Determine which voice to use based on context
        if context.get("department") == "technical":
            voice_uuid = self.voices["technical_expert"]
        elif context.get("intent") == "purchase":
            voice_uuid = self.voices["sales_rep"]
        elif context.get("type") == "announcement":
            voice_uuid = self.voices["announcer"]
        else:
            voice_uuid = self.voices["customer_service"]
            
        return await self.synthesize_with_voice(message, voice_uuid)

Pricing Structure

💰 Custom Voice Pricing

Resemble AI pricing is based on usage and plan features. WebSocket streaming requires Business plans or higher.

Plan	Monthly Cost	Characters Included	WebSocket Streaming	Custom Voices
Basic	$29	200,000	❌	3 voices
Pro	$89	800,000	❌	10 voices
Business	$199	2,000,000	✅	25 voices
Enterprise	Custom	Custom	✅	Unlimited

WebSocket Requirement: Real-time TTS for phone calls requires Business plan ($199/month) or higher due to WebSocket streaming dependency.

Cost Optimization Tips

Efficient Voice Usage

Voice Reuse: Create versatile voices that work across multiple use cases
Batch Processing: Use REST API for non-real-time applications to save costs
Smart Caching: Cache frequently used phrases to reduce API calls
Context-Aware Selection: Use different voices only when necessary for user experience

Quality Assurance

🎯 Voice Quality Testing

Ensure your custom voices meet production standards with systematic testing approaches.

Testing Framework

Initial Voice Validation

Test basic voice quality with standard phrases

Domain-Specific Testing

Test with actual content from your application domain

Edge Case Testing

Test with numbers, abbreviations, and special cases

User Acceptance Testing

Get feedback from actual users or stakeholders

Production Monitoring

Monitor voice quality in real applications

Common Quality Issues

Pronunciation Problems

Issue: Custom voice mispronounces specific wordsSolutions:

Include problematic words in training data
Use phonetic spelling in TTS requests
Create pronunciation guide for domain-specific terms
Retrain voice with additional samples if needed

Example Fix:

# Phonetic corrections for common issues
PRONUNCIATION_FIXES = {
    "API": "A P I",
    "HTTP": "H T T P",
    "OAuth": "O Auth",
    "UUID": "U U I D"
}

def apply_pronunciation_fixes(text):
    for term, phonetic in PRONUNCIATION_FIXES.items():
        text = text.replace(term, phonetic)
    return text

Emotional Range Limitations

Issue: Voice sounds monotone or lacks expressionSolutions:

Include more emotional range in training samples
Use varied sentence types during training
Consider retraining with more expressive speaker
Test with TTS-specific emotional markup if available

Training Improvement:

Training script should include:
- Questions: "How can I help you today?"
- Excitement: "That's fantastic news!"
- Concern: "I'm sorry to hear about that."
- Professional: "Let me check that for you."

Troubleshooting

WebSocket Connection Issues

Problem: Cannot establish WebSocket connectionSolutions:

Verify Business plan subscription
Check API key permissions for streaming
Confirm project UUID is correct
Test connection with WebSocket debugging tools
Check firewall settings for WebSocket traffic

Voice UUID Not Found

Problem: Custom voice UUID returns errorSolutions:

Verify voice training is completed
Check voice UUID spelling in configuration
Confirm voice is associated with correct project
Contact support if voice disappeared after training

Audio Quality Issues

Problem: Generated audio has artifacts or poor qualitySolutions:

Adjust audio format settings (sample rate, precision)
Test with different output formats
Check if voice training data was high quality
Consider retraining voice with better samples
Verify network stability for streaming

Migration Guide

From Standard TTS Providers
Voice Replacement Strategy

Migration Benefits:

Custom brand voice consistency
WebSocket streaming for real-time apps
Unlimited voice creation potential
Professional voice quality control

Migration Steps:

Voice Planning: Decide what custom voices you need
Training Data: Collect high-quality voice samples
Voice Creation: Train your custom voices
Testing: Validate voice quality and performance
Integration: Update API calls to use custom voice UUIDs
Monitoring: Implement quality monitoring

Generic Voice → Custom Voice Mapping:

# Migration mapping example
VOICE_MIGRATION = {
    # Old generic voices → New custom voices
    "rachel_elevenlabs": "custom_customer_service_female",
    "josh_elevenlabs": "custom_customer_service_male",
    "asteria_deepgram": "custom_phone_representative",
    "ashley_inworld": "custom_friendly_assistant"
}

def migrate_voice_selection(old_voice_id):
    return VOICE_MIGRATION.get(old_voice_id, "default_custom_voice")

🎯 Ready to Create Your Brand Voice?

Set up Resemble AI in your assistant configuration and start building custom voices that represent your brand perfectly!

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

🎯 Resemble AI: Custom Brand Voices

Quick Setup

Voice Creation Process

🎙️ Build Your Brand Voice

Voice Training Steps

Available Models

🔧 Synthesis Models

Default Synthesis Model

WebSocket Streaming

⚡ Real-Time Streaming

Streaming Setup

Audio Format Configuration

Custom Voice Management

🎛️ Voice Library Management

Voice Categories

Voice UUID Management

Integration Examples

Pricing Structure

💰 Custom Voice Pricing

Cost Optimization Tips

Quality Assurance

🎯 Voice Quality Testing

Testing Framework

Common Quality Issues

Troubleshooting

Migration Guide

🎯 Ready to Create Your Brand Voice?

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

🎯 Resemble AI: Custom Brand Voices

​Quick Setup

​Voice Creation Process

🎙️ Build Your Brand Voice

​Voice Training Steps

​Available Models

🔧 Synthesis Models

Default Synthesis Model

​WebSocket Streaming

⚡ Real-Time Streaming

​Streaming Setup

​Audio Format Configuration

​Custom Voice Management

🎛️ Voice Library Management

​Voice Categories

​Voice UUID Management

​Integration Examples

​Pricing Structure

💰 Custom Voice Pricing

​Cost Optimization Tips

​Quality Assurance

🎯 Voice Quality Testing

​Testing Framework

​Common Quality Issues

​Troubleshooting

​Migration Guide

🎯 Ready to Create Your Brand Voice?

Quick Setup

Voice Creation Process

Voice Training Steps

Available Models

WebSocket Streaming

Streaming Setup

Audio Format Configuration

Custom Voice Management

Voice Categories

Voice UUID Management

Integration Examples

Pricing Structure

Cost Optimization Tips

Quality Assurance

Testing Framework

Common Quality Issues

Troubleshooting

Migration Guide