Azure Speech STT

☁️ Azure Speech: Enterprise Scale

Microsoft’s neural STT service with 100+ languages and regional variants. Seamless integration with Azure ecosystem, phrase lists for term boosting, and enterprise-grade reliability. Perfect for organizations already using Microsoft services or requiring broad language support.

Quick Setup

Create Azure Speech Resource

Go to Azure Portal
Create a new Speech resource
Select your subscription, resource group, and region
Note your Key and Region from the resource’s Keys and Endpoint page

Configure in Burki

Go to AI Configuration → STT tab
Select Azure as the provider
Enter your Subscription Key and Region (e.g., eastus, westus2)

Choose Model & Language

Select your preferred model and language from the dropdowns

Free Tier: Azure offers 5 hours of audio per month free for speech-to-text. See Azure Pricing for details.

Available Models

🎯 Standard

General PurposeBalanced accuracy and performance for most use casesLanguages: 30+ Best for: General transcription

⚡ Enhanced

Improved AccuracyBetter recognition for major languagesLanguages: 18 Best for: High-accuracy needs

🧠 Neural

Highest QualityState-of-the-art neural recognitionLanguages: 16 Best for: Premium applications

Recommendation: Start with Standard for broad language support, or use Neural for English-focused applications requiring the highest accuracy.

Language Support

Azure Speech STT supports 100+ languages and regional variants. Here are the most commonly used:

Full Language List

Language	Code	Standard	Enhanced	Neural
English (US)	`en-US`	✅	✅	✅
English (UK)	`en-GB`	✅	✅	✅
English (Australia)	`en-AU`	✅	✅	✅
English (Canada)	`en-CA`	✅	✅	✅
English (India)	`en-IN`	✅	✅	✅
Spanish (Spain)	`es-ES`	✅	✅	✅
Spanish (Mexico)	`es-MX`	✅	✅	✅
French (France)	`fr-FR`	✅	✅	✅
French (Canada)	`fr-CA`	✅	✅	✅
German	`de-DE`	✅	✅	✅
Italian	`it-IT`	✅	✅	✅
Portuguese (Brazil)	`pt-BR`	✅	✅	✅
Portuguese (Portugal)	`pt-PT`	✅	✅	✅
Japanese	`ja-JP`	✅	✅	✅
Korean	`ko-KR`	✅	✅	✅
Chinese (Mandarin)	`zh-CN`	✅	✅	✅
Chinese (Hong Kong)	`zh-HK`	✅	✅	–
Chinese (Taiwan)	`zh-TW`	✅	✅	–
Arabic (Saudi Arabia)	`ar-SA`	✅	–	–
Hindi	`hi-IN`	✅	–	–
Dutch	`nl-NL`	✅	–	–
Russian	`ru-RU`	✅	–	–
Swedish	`sv-SE`	✅	–	–
Danish	`da-DK`	✅	–	–
Norwegian	`no-NO`	✅	–	–
Finnish	`fi-FI`	✅	–	–
Polish	`pl-PL`	✅	–	–
Turkish	`tr-TR`	✅	–	–
Hebrew	`he-IL`	✅	–	–
Thai	`th-TH`	✅	–	–

100+ More Languages: Azure supports many additional languages and regional variants. Visit the Azure Language Support page for the complete list.

Configuration Options

Basic Configuration

{
  "stt_settings": {
    "provider": "azure",
    "model": "standard",
    "language": "en-US"
  }
}

Full Configuration

{
  "stt_settings": {
    "provider": "azure",
    "model": "standard",
    "language": "en-US",
    "punctuate": true,
    "interim_results": true,
    "smart_format": true,
    "endpointing": 10,
    "utterance_end_ms": 1000,
    "vad_events": true,
    "keyterms": ["Burki", "AI assistant", "customer support"]
  }
}

Per-Assistant Azure Credentials

You can configure Azure credentials per-assistant instead of using environment variables:

{
  "stt_settings": {
    "provider": "azure",
    "azure_config": {
      "subscription_key": "your_subscription_key",
      "region": "eastus"
    }
  }
}

Phrase Lists (Keyterms)

Phrase lists boost recognition of specific terms—perfect for company names, product names, and industry terminology.

Dashboard
API

In AI Configuration → STT → Keywords/Keyterms:Enter terms separated by commas:

Burki, AI assistant, voice platform, customer success

{
  "stt_settings": {
    "provider": "azure",
    "keyterms": [
      "Burki",
      "AI assistant", 
      "voice platform",
      "customer success"
    ]
  }
}

Best Practice: Add your company name, product names, and any domain-specific terminology to phrase lists for improved recognition accuracy.

Key Features

🎯 Phrase Lists

Term BoostingBoost recognition of specific words and phrases for your domain

👥 Speaker Diarization

Speaker IdentificationDistinguish between multiple speakers in a conversation

🔊 Multi-Channel

Stereo SupportProcess audio with separate channels for each participant

⚡ Real-Time

Low LatencyReal-time transcription with interim results

Timing Controls

Azure Speech STT supports timing controls to optimize speech detection:

Endpointing (Silence Threshold)

What it does: How long to wait after detecting silence before considering speech has ended.Default: 10ms (minimal endpointing) Range: 10ms - 2000msWhen to Adjust:

Lower (10-100ms): For fast talkers or quick interactions
Higher (500-1000ms): For elderly callers or complex topics
Much higher (1500ms+): For people with speech difficulties

{
  "stt_settings": {
    "endpointing": 500
  }
}

Utterance End Timeout

What it does: Maximum time to wait for a complete utterance before triggering end-of-speech.Default: 1000ms Range: 500ms - 5000msWhen to Adjust:

Lower (500-800ms): For short, quick interactions
Higher (1500-3000ms): For detailed conversations

{
  "stt_settings": {
    "utterance_end_ms": 1500
  }
}

VAD Events

What it does: Enables Voice Activity Detection for enhanced speech detection.Default: EnabledBenefits:

Better speech detection in noisy environments
Backup mechanism when normal detection fails
Essential for background noise scenarios

{
  "stt_settings": {
    "vad_events": true
  }
}

Provider Comparison

Feature	Azure Speech	Deepgram
Languages	100+	30+
Latency	~200ms	~100ms
Term Boosting	Phrase Lists	Keywords/Keyterms
Diarization	✅	✅
Custom Models	✅ (Custom Speech)	Limited
Real-Time	✅	✅
Multi-Channel	✅	✅
Best For	Enterprise, Multi-language	Speed, Phone calls

When to Choose Azure: Broad language support, Microsoft ecosystem integration, enterprise features, or custom speech models.When to Choose Deepgram: Ultra-low latency, phone call optimization, or Nova-3 keyterms for English.

Regional Selection

Latency Optimization: Choose the Azure region closest to your deployment for optimal latency.

Region	Location	Best For
`eastus`	East US	North America (East)
`eastus2`	East US 2	North America (East) - Backup
`westus2`	West US 2	North America (West)
`westeurope`	Netherlands	Europe
`northeurope`	Ireland	Europe - Backup
`southeastasia`	Singapore	Asia-Pacific
`australiaeast`	Australia East	Australia/Oceania

Pricing Overview

Tier	Hours/Month	Price
Free	5 hours	$0
Standard	Pay-as-you-go	$1 per audio hour

Enterprise: Contact Azure for custom pricing on high-volume usage and reserved capacity. Custom Speech model training has separate pricing.

Common Issues & Solutions

Authentication Failed

Problem: API returns 401 UnauthorizedSolutions:

Verify your Azure Speech Key is correct in Settings → Provider Keys
Ensure the key is from your Speech resource (not another Azure service)
Check that the region matches your Speech resource’s region
Verify your Azure subscription is active

Region Mismatch

Problem: Connection fails or returns errorsSolutions:

Double-check the region code (e.g., eastus not east-us)
Ensure the region is available for Speech services
Try a different region if experiencing connectivity issues

Language Detection Issues

Problem: Wrong language transcribedSolutions:

Explicitly set the language code in configuration
Use the correct regional variant (e.g., es-ES vs es-MX)
Ensure your model supports the selected language

Poor Recognition Accuracy

Problem: Transcription quality is lowSolutions:

Add domain-specific terms to phrase lists
Try a different model (Enhanced or Neural for supported languages)
Ensure audio quality is good (minimal background noise)
Enable audio denoising in Burki settings

SDK Not Installed

Problem: Azure Speech SDK import failsSolution:

pip install azure-cognitiveservices-speech

Best Practices

⚡ Need Speed?

Deepgram - Ultra-low ~100ms latency, optimized for phone calls

🔧 Timing Controls

Advanced Settings - Fine-tune speech detection timing

📞 Call Management

Conversation Flow - Configure interruption and timeout settings

🔗 Additional Resources

Azure Portal: portal.azure.comLanguage Support: Azure STT Language ListDocumentation: Azure Speech Service DocsPricing: Azure Speech Pricing

🚀 Ready to Use Azure Speech STT?

Head back to your assistant configuration and set up Azure Speech for enterprise-grade speech-to-text!

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

☁️ Azure Speech: Enterprise Scale

Quick Setup

Available Models

🎯 Standard

⚡ Enhanced

🧠 Neural

Language Support

Configuration Options

Basic Configuration

Full Configuration

Per-Assistant Azure Credentials

Phrase Lists (Keyterms)

Key Features

🎯 Phrase Lists

👥 Speaker Diarization

🔊 Multi-Channel

⚡ Real-Time

Timing Controls

Provider Comparison

Regional Selection

Pricing Overview

Common Issues & Solutions

Best Practices

See Also

⚡ Need Speed?

🔧 Timing Controls

📞 Call Management

🔗 Additional Resources

🚀 Ready to Use Azure Speech STT?

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

☁️ Azure Speech: Enterprise Scale

​Quick Setup

​Available Models

🎯 Standard

⚡ Enhanced

🧠 Neural

​Language Support

​Configuration Options

​Basic Configuration

​Full Configuration

​Per-Assistant Azure Credentials

​Phrase Lists (Keyterms)

​Key Features

🎯 Phrase Lists

👥 Speaker Diarization

🔊 Multi-Channel

⚡ Real-Time

​Timing Controls

​Provider Comparison

​Regional Selection

​Pricing Overview

​Common Issues & Solutions

​Best Practices

​See Also

⚡ Need Speed?

🔧 Timing Controls

📞 Call Management

🔗 Additional Resources

🚀 Ready to Use Azure Speech STT?

Quick Setup

Available Models

Language Support

Configuration Options

Basic Configuration

Full Configuration

Per-Assistant Azure Credentials

Phrase Lists (Keyterms)

Key Features

Timing Controls

Provider Comparison

Regional Selection

Pricing Overview

Common Issues & Solutions

Best Practices

See Also