Skip to main content

Coming Soon

OpenAI TTS integration is planned for a future release. This page previews the expected capabilities.

Overview

OpenAI provides high-quality text-to-speech through their API with multiple models and natural-sounding voices.

Available Models

Standard Quality Model
  • Latency: ~400-600ms
  • Quality: Good for most applications
  • Cost: Lower cost per character
  • Best for: General-purpose TTS, cost-sensitive applications

Voice Options

Alloy

Balanced and clearNeutral voice suitable for most applications

Echo

Deep and resonantMale voice with rich, deep tone

Fable

Warm and expressiveEngaging voice for storytelling

Onyx

Strong and authoritativeConfident male voice for professional use

Nova

Bright and energeticFemale voice with upbeat personality

Shimmer

Soft and gentleGentle female voice for calm interactions

Planned Features

When integrated, OpenAI TTS will offer:
  • Multiple Models: TTS-1 for speed, TTS-1-HD for quality
  • 6 Built-in Voices: Alloy, Echo, Fable, Onyx, Nova, Shimmer
  • Speed Control: Adjust speaking rate (0.25x to 4.0x)
  • Multiple Formats: MP3, Opus, AAC, FLAC output options
  • Streaming Support: Real-time audio streaming for phone calls

Comparison with Available Providers

FeatureOpenAI TTSElevenLabsDeepgramCartesia
Latency~500ms~250ms~75ms~150ms
QualityHighPremiumGoodHigh
Voices6 built-inCustom10+20+
LanguagesEnglish70+English42
Custom VoicesNoYesNoYes

Available Alternatives

While waiting for OpenAI TTS, these providers offer excellent alternatives:

ElevenLabs

Premium Quality70+ languages, voice cloning, advanced controls

Deepgram Aura

Ultra-Fast~75ms latency, perfect for real-time conversations

Cartesia Sonic 3

Multilingual42 languages, voice cloning, low latency

Azure Speech

Enterprise Scale500+ voices, 100+ languages, SSML support

Stay Updated

Follow our changelog for updates on OpenAI TTS integration and other new features.