Using the official SwiftTranscriptionSampleApp from WWDC 2025, speech transcription takes 14+ seconds from audio input to first result, making it unusable for real-time applications.
Environment
- iOS: 26.0 Beta
- Xcode: Beta 5
- Device: iPhone 16 pro
- Sample App: Official Apple SwiftTranscriptionSampleApp from WWDC 2025
Configuration Tested
- Locale:
en-US
(properly allocated withAssetInventory.allocate(locale:)
) andes-ES
- Setup: All optimizations applied (preheating, high priority, model retention)
I started testing in my own app to replace SFSpeech API and include speech detection but after long fights with documentation (this part is quite terrible TBH) I tested the example (https://developer.apple.com/documentation/speech/bringing-advanced-speech-to-text-capabilities-to-your-app) and saw same results.
I added some logs to check the specific time:
🎙️ [20:30:41.532] ✅ Analyzer started successfully - ready to receive audio!
🎙️ [20:30:41.532] Listening for transcription results...
🎙️ [20:30:56.342] 🚀 FIRST TRANSCRIPTION RESULT after 14.810s: 'Hello' (isFinal: false)
Questions
- Is this expected performance for iOS 26 Beta, because old SFSpeech is far faster?
- Are there additional optimization steps for SpeechTranscriber?
- Should we expect significant performance improvements in later betas?