βοΈ Azure Speech: Enterprise Scale
Microsoftβs neural STT service with 100+ languages and regional variants. Seamless integration with Azure ecosystem, phrase lists for term boosting, and enterprise-grade reliability. Perfect for organizations already using Microsoft services or requiring broad language support.
Quick Setup
Create Azure Speech Resource
- Go to Azure Portal
- Create a new Speech resource
- Select your subscription, resource group, and region
- Note your Key and Region from the resourceβs Keys and Endpoint page
Configure in Burki
- Go to AI Configuration β STT tab
- Select Azure as the provider
- Enter your Subscription Key and Region (e.g.,
eastus,westus2)
Free Tier: Azure offers 5 hours of audio per month free for speech-to-text. See Azure Pricing for details.
Available Models
π― Standard
General PurposeBalanced accuracy and performance for most use casesLanguages: 30+
Best for: General transcription
β‘ Enhanced
Improved AccuracyBetter recognition for major languagesLanguages: 18
Best for: High-accuracy needs
π§ Neural
Highest QualityState-of-the-art neural recognitionLanguages: 16
Best for: Premium applications
Recommendation: Start with Standard for broad language support, or use Neural for English-focused applications requiring the highest accuracy.
Language Support
Azure Speech STT supports 100+ languages and regional variants. Here are the most commonly used:Full Language List
Full Language List
| Language | Code | Standard | Enhanced | Neural |
|---|---|---|---|---|
| English (US) | en-US | β | β | β |
| English (UK) | en-GB | β | β | β |
| English (Australia) | en-AU | β | β | β |
| English (Canada) | en-CA | β | β | β |
| English (India) | en-IN | β | β | β |
| Spanish (Spain) | es-ES | β | β | β |
| Spanish (Mexico) | es-MX | β | β | β |
| French (France) | fr-FR | β | β | β |
| French (Canada) | fr-CA | β | β | β |
| German | de-DE | β | β | β |
| Italian | it-IT | β | β | β |
| Portuguese (Brazil) | pt-BR | β | β | β |
| Portuguese (Portugal) | pt-PT | β | β | β |
| Japanese | ja-JP | β | β | β |
| Korean | ko-KR | β | β | β |
| Chinese (Mandarin) | zh-CN | β | β | β |
| Chinese (Hong Kong) | zh-HK | β | β | β |
| Chinese (Taiwan) | zh-TW | β | β | β |
| Arabic (Saudi Arabia) | ar-SA | β | β | β |
| Hindi | hi-IN | β | β | β |
| Dutch | nl-NL | β | β | β |
| Russian | ru-RU | β | β | β |
| Swedish | sv-SE | β | β | β |
| Danish | da-DK | β | β | β |
| Norwegian | no-NO | β | β | β |
| Finnish | fi-FI | β | β | β |
| Polish | pl-PL | β | β | β |
| Turkish | tr-TR | β | β | β |
| Hebrew | he-IL | β | β | β |
| Thai | th-TH | β | β | β |
100+ More Languages: Azure supports many additional languages and regional variants. Visit the Azure Language Support page for the complete list.
Configuration Options
Basic Configuration
Full Configuration
Per-Assistant Azure Credentials
You can configure Azure credentials per-assistant instead of using environment variables:Phrase Lists (Keyterms)
Phrase lists boost recognition of specific termsβperfect for company names, product names, and industry terminology.- Dashboard
- API
In AI Configuration β STT β Keywords/Keyterms:Enter terms separated by commas:
Best Practice: Add your company name, product names, and any domain-specific terminology to phrase lists for improved recognition accuracy.
Key Features
π― Phrase Lists
Term BoostingBoost recognition of specific words and phrases for your domain
π₯ Speaker Diarization
Speaker IdentificationDistinguish between multiple speakers in a conversation
π Multi-Channel
Stereo SupportProcess audio with separate channels for each participant
β‘ Real-Time
Low LatencyReal-time transcription with interim results
Timing Controls
Azure Speech STT supports timing controls to optimize speech detection:Endpointing (Silence Threshold)
Endpointing (Silence Threshold)
What it does: How long to wait after detecting silence before considering speech has ended.Default: 10ms (minimal endpointing)
Range: 10ms - 2000msWhen to Adjust:
- Lower (10-100ms): For fast talkers or quick interactions
- Higher (500-1000ms): For elderly callers or complex topics
- Much higher (1500ms+): For people with speech difficulties
Utterance End Timeout
Utterance End Timeout
What it does: Maximum time to wait for a complete utterance before triggering end-of-speech.Default: 1000ms
Range: 500ms - 5000msWhen to Adjust:
- Lower (500-800ms): For short, quick interactions
- Higher (1500-3000ms): For detailed conversations
VAD Events
VAD Events
What it does: Enables Voice Activity Detection for enhanced speech detection.Default: EnabledBenefits:
- Better speech detection in noisy environments
- Backup mechanism when normal detection fails
- Essential for background noise scenarios
Provider Comparison
| Feature | Azure Speech | Deepgram |
|---|---|---|
| Languages | 100+ | 30+ |
| Latency | ~200ms | ~100ms |
| Term Boosting | Phrase Lists | Keywords/Keyterms |
| Diarization | β | β |
| Custom Models | β (Custom Speech) | Limited |
| Real-Time | β | β |
| Multi-Channel | β | β |
| Best For | Enterprise, Multi-language | Speed, Phone calls |
When to Choose Azure: Broad language support, Microsoft ecosystem integration, enterprise features, or custom speech models.When to Choose Deepgram: Ultra-low latency, phone call optimization, or Nova-3 keyterms for English.
Regional Selection
Latency Optimization: Choose the Azure region closest to your deployment for optimal latency.
| Region | Location | Best For |
|---|---|---|
eastus | East US | North America (East) |
eastus2 | East US 2 | North America (East) - Backup |
westus2 | West US 2 | North America (West) |
westeurope | Netherlands | Europe |
northeurope | Ireland | Europe - Backup |
southeastasia | Singapore | Asia-Pacific |
australiaeast | Australia East | Australia/Oceania |
Pricing Overview
| Tier | Hours/Month | Price |
|---|---|---|
| Free | 5 hours | $0 |
| Standard | Pay-as-you-go | $1 per audio hour |
Enterprise: Contact Azure for custom pricing on high-volume usage and reserved capacity. Custom Speech model training has separate pricing.
Common Issues & Solutions
Authentication Failed
Authentication Failed
Problem: API returns 401 UnauthorizedSolutions:
- Verify your Azure Speech Key is correct in Settings β Provider Keys
- Ensure the key is from your Speech resource (not another Azure service)
- Check that the region matches your Speech resourceβs region
- Verify your Azure subscription is active
Region Mismatch
Region Mismatch
Problem: Connection fails or returns errorsSolutions:
- Double-check the region code (e.g.,
eastusnoteast-us) - Ensure the region is available for Speech services
- Try a different region if experiencing connectivity issues
Language Detection Issues
Language Detection Issues
Problem: Wrong language transcribedSolutions:
- Explicitly set the language code in configuration
- Use the correct regional variant (e.g.,
es-ESvses-MX) - Ensure your model supports the selected language
Poor Recognition Accuracy
Poor Recognition Accuracy
Problem: Transcription quality is lowSolutions:
- Add domain-specific terms to phrase lists
- Try a different model (Enhanced or Neural for supported languages)
- Ensure audio quality is good (minimal background noise)
- Enable audio denoising in Burki settings
SDK Not Installed
SDK Not Installed
Problem: Azure Speech SDK import failsSolution:
Best Practices
See Also
β‘ Need Speed?
Deepgram - Ultra-low ~100ms latency, optimized for phone calls
π§ Timing Controls
Advanced Settings - Fine-tune speech detection timing
π Call Management
Conversation Flow - Configure interruption and timeout settings
π Additional Resources
Azure Portal: portal.azure.comLanguage Support: Azure STT Language ListDocumentation: Azure Speech Service DocsPricing: Azure Speech Pricing
π Ready to Use Azure Speech STT?
Head back to your assistant configuration and set up Azure Speech for enterprise-grade speech-to-text!