Skip to main content

☁️ Azure Speech: Enterprise Scale

Microsoft’s neural STT service with 100+ languages and regional variants. Seamless integration with Azure ecosystem, phrase lists for term boosting, and enterprise-grade reliability. Perfect for organizations already using Microsoft services or requiring broad language support.

Quick Setup

1

Create Azure Speech Resource

  1. Go to Azure Portal
  2. Create a new Speech resource
  3. Select your subscription, resource group, and region
  4. Note your Key and Region from the resource’s Keys and Endpoint page
2

Configure in Burki

  1. Go to AI Configuration β†’ STT tab
  2. Select Azure as the provider
  3. Enter your Subscription Key and Region (e.g., eastus, westus2)
3

Choose Model & Language

Select your preferred model and language from the dropdowns
Free Tier: Azure offers 5 hours of audio per month free for speech-to-text. See Azure Pricing for details.

Available Models

🎯 Standard

General PurposeBalanced accuracy and performance for most use casesLanguages: 30+ Best for: General transcription

⚑ Enhanced

Improved AccuracyBetter recognition for major languagesLanguages: 18 Best for: High-accuracy needs

🧠 Neural

Highest QualityState-of-the-art neural recognitionLanguages: 16 Best for: Premium applications
Recommendation: Start with Standard for broad language support, or use Neural for English-focused applications requiring the highest accuracy.

Language Support

Azure Speech STT supports 100+ languages and regional variants. Here are the most commonly used:
LanguageCodeStandardEnhancedNeural
English (US)en-USβœ…βœ…βœ…
English (UK)en-GBβœ…βœ…βœ…
English (Australia)en-AUβœ…βœ…βœ…
English (Canada)en-CAβœ…βœ…βœ…
English (India)en-INβœ…βœ…βœ…
Spanish (Spain)es-ESβœ…βœ…βœ…
Spanish (Mexico)es-MXβœ…βœ…βœ…
French (France)fr-FRβœ…βœ…βœ…
French (Canada)fr-CAβœ…βœ…βœ…
Germande-DEβœ…βœ…βœ…
Italianit-ITβœ…βœ…βœ…
Portuguese (Brazil)pt-BRβœ…βœ…βœ…
Portuguese (Portugal)pt-PTβœ…βœ…βœ…
Japaneseja-JPβœ…βœ…βœ…
Koreanko-KRβœ…βœ…βœ…
Chinese (Mandarin)zh-CNβœ…βœ…βœ…
Chinese (Hong Kong)zh-HKβœ…βœ…β€“
Chinese (Taiwan)zh-TWβœ…βœ…β€“
Arabic (Saudi Arabia)ar-SAβœ…β€“β€“
Hindihi-INβœ…β€“β€“
Dutchnl-NLβœ…β€“β€“
Russianru-RUβœ…β€“β€“
Swedishsv-SEβœ…β€“β€“
Danishda-DKβœ…β€“β€“
Norwegianno-NOβœ…β€“β€“
Finnishfi-FIβœ…β€“β€“
Polishpl-PLβœ…β€“β€“
Turkishtr-TRβœ…β€“β€“
Hebrewhe-ILβœ…β€“β€“
Thaith-THβœ…β€“β€“
100+ More Languages: Azure supports many additional languages and regional variants. Visit the Azure Language Support page for the complete list.

Configuration Options

Basic Configuration

{
  "stt_settings": {
    "provider": "azure",
    "model": "standard",
    "language": "en-US"
  }
}

Full Configuration

{
  "stt_settings": {
    "provider": "azure",
    "model": "standard",
    "language": "en-US",
    "punctuate": true,
    "interim_results": true,
    "smart_format": true,
    "endpointing": 10,
    "utterance_end_ms": 1000,
    "vad_events": true,
    "keyterms": ["Burki", "AI assistant", "customer support"]
  }
}

Per-Assistant Azure Credentials

You can configure Azure credentials per-assistant instead of using environment variables:
{
  "stt_settings": {
    "provider": "azure",
    "azure_config": {
      "subscription_key": "your_subscription_key",
      "region": "eastus"
    }
  }
}

Phrase Lists (Keyterms)

Phrase lists boost recognition of specific termsβ€”perfect for company names, product names, and industry terminology.
In AI Configuration β†’ STT β†’ Keywords/Keyterms:Enter terms separated by commas:
Burki, AI assistant, voice platform, customer success
Best Practice: Add your company name, product names, and any domain-specific terminology to phrase lists for improved recognition accuracy.

Key Features

🎯 Phrase Lists

Term BoostingBoost recognition of specific words and phrases for your domain

πŸ‘₯ Speaker Diarization

Speaker IdentificationDistinguish between multiple speakers in a conversation

πŸ”Š Multi-Channel

Stereo SupportProcess audio with separate channels for each participant

⚑ Real-Time

Low LatencyReal-time transcription with interim results

Timing Controls

Azure Speech STT supports timing controls to optimize speech detection:
What it does: How long to wait after detecting silence before considering speech has ended.Default: 10ms (minimal endpointing) Range: 10ms - 2000msWhen to Adjust:
  • Lower (10-100ms): For fast talkers or quick interactions
  • Higher (500-1000ms): For elderly callers or complex topics
  • Much higher (1500ms+): For people with speech difficulties
{
  "stt_settings": {
    "endpointing": 500
  }
}
What it does: Maximum time to wait for a complete utterance before triggering end-of-speech.Default: 1000ms Range: 500ms - 5000msWhen to Adjust:
  • Lower (500-800ms): For short, quick interactions
  • Higher (1500-3000ms): For detailed conversations
{
  "stt_settings": {
    "utterance_end_ms": 1500
  }
}
What it does: Enables Voice Activity Detection for enhanced speech detection.Default: EnabledBenefits:
  • Better speech detection in noisy environments
  • Backup mechanism when normal detection fails
  • Essential for background noise scenarios
{
  "stt_settings": {
    "vad_events": true
  }
}

Provider Comparison

FeatureAzure SpeechDeepgram
Languages100+30+
Latency~200ms~100ms
Term BoostingPhrase ListsKeywords/Keyterms
Diarizationβœ…βœ…
Custom Modelsβœ… (Custom Speech)Limited
Real-Timeβœ…βœ…
Multi-Channelβœ…βœ…
Best ForEnterprise, Multi-languageSpeed, Phone calls
When to Choose Azure: Broad language support, Microsoft ecosystem integration, enterprise features, or custom speech models.When to Choose Deepgram: Ultra-low latency, phone call optimization, or Nova-3 keyterms for English.

Regional Selection

Latency Optimization: Choose the Azure region closest to your deployment for optimal latency.
RegionLocationBest For
eastusEast USNorth America (East)
eastus2East US 2North America (East) - Backup
westus2West US 2North America (West)
westeuropeNetherlandsEurope
northeuropeIrelandEurope - Backup
southeastasiaSingaporeAsia-Pacific
australiaeastAustralia EastAustralia/Oceania

Pricing Overview

TierHours/MonthPrice
Free5 hours$0
StandardPay-as-you-go$1 per audio hour
Enterprise: Contact Azure for custom pricing on high-volume usage and reserved capacity. Custom Speech model training has separate pricing.

Common Issues & Solutions

Problem: API returns 401 UnauthorizedSolutions:
  • Verify your Azure Speech Key is correct in Settings β†’ Provider Keys
  • Ensure the key is from your Speech resource (not another Azure service)
  • Check that the region matches your Speech resource’s region
  • Verify your Azure subscription is active
Problem: Connection fails or returns errorsSolutions:
  • Double-check the region code (e.g., eastus not east-us)
  • Ensure the region is available for Speech services
  • Try a different region if experiencing connectivity issues
Problem: Wrong language transcribedSolutions:
  • Explicitly set the language code in configuration
  • Use the correct regional variant (e.g., es-ES vs es-MX)
  • Ensure your model supports the selected language
Problem: Transcription quality is lowSolutions:
  • Add domain-specific terms to phrase lists
  • Try a different model (Enhanced or Neural for supported languages)
  • Ensure audio quality is good (minimal background noise)
  • Enable audio denoising in Burki settings
Problem: Azure Speech SDK import failsSolution:
pip install azure-cognitiveservices-speech

Best Practices


See Also

⚑ Need Speed?

Deepgram - Ultra-low ~100ms latency, optimized for phone calls

πŸ”§ Timing Controls

Advanced Settings - Fine-tune speech detection timing

πŸ“ž Call Management

Conversation Flow - Configure interruption and timeout settings

πŸ”— Additional Resources

Azure Portal: portal.azure.comLanguage Support: Azure STT Language ListDocumentation: Azure Speech Service DocsPricing: Azure Speech Pricing

πŸš€ Ready to Use Azure Speech STT?

Head back to your assistant configuration and set up Azure Speech for enterprise-grade speech-to-text!