Using the official SwiftTranscriptionSampleApp from WWDC 2025, speech transcription takes 14+ seconds from audio input to first result, making it unusable for real-time applications.
Environment
iOS: 26.0 Beta
Xcode: Beta 5
Device: iPhone 16 pro
Sample App: Official Apple SwiftTranscriptionSampleApp from WWDC 2025
Configuration Tested
Locale: en-US (properly allocated with AssetInventory.allocate(locale:)) and es-ES
Setup: All optimizations applied (preheating, high priority, model retention)
I started testing in my own app to replace SFSpeech API and include speech detection but after long fights with documentation (this part is quite terrible TBH) I tested the example (https://developer.apple.com/documentation/speech/bringing-advanced-speech-to-text-capabilities-to-your-app) and saw same results.
I added some logs to check the specific time:
🎙️ [20:30:41.532] ✅ Analyzer started successfully - ready to receive audio!
🎙️ [20:30:41.532] Listening for transcription results...
🎙️ [20:30:56.342] 🚀 FIRST TRANSCRIPTION RESULT after 14.810s: 'Hello' (isFinal: false)
Questions
Is this expected performance for iOS 26 Beta, because old SFSpeech is far faster?
Are there additional optimization steps for SpeechTranscriber?
Should we expect significant performance improvements in later betas?
Selecting any option will automatically load the page