SpeechTranscriber extremely slow (14+ seconds) despite proper locale allocation and optimization

Using the official SwiftTranscriptionSampleApp from WWDC 2025, speech transcription takes 14+ seconds from audio input to first result, making it unusable for real-time applications.

Environment

  • iOS: 26.0 Beta
  • Xcode: Beta 5
  • Device: iPhone 16 pro
  • Sample App: Official Apple SwiftTranscriptionSampleApp from WWDC 2025

Configuration Tested

  • Locale: en-US (properly allocated with AssetInventory.allocate(locale:)) and es-ES
  • Setup: All optimizations applied (preheating, high priority, model retention)

I started testing in my own app to replace SFSpeech API and include speech detection but after long fights with documentation (this part is quite terrible TBH) I tested the example (https://developer.apple.com/documentation/speech/bringing-advanced-speech-to-text-capabilities-to-your-app) and saw same results.

I added some logs to check the specific time:

🎙️ [20:30:41.532] ✅ Analyzer started successfully - ready to receive audio!
🎙️ [20:30:41.532] Listening for transcription results...
🎙️ [20:30:56.342] 🚀 FIRST TRANSCRIPTION RESULT after 14.810s: 'Hello' (isFinal: false)

Questions

  1. Is this expected performance for iOS 26 Beta, because old SFSpeech is far faster?
  2. Are there additional optimization steps for SpeechTranscriber?
  3. Should we expect significant performance improvements in later betas?

This may be an interaction with Swift's new approachable concurrency feature, which wasn't in play during the development of the sample app.

If approachable concurrency is enabled, then the application may not be processing and displaying results in parallel as expected.

Instead, it may be doing both on the main actor, and therefore waiting for the entire audio file to finish processing before displaying any results.

If so, you can resolve the issue by either turning off approachable concurrency, or by marking certain methods as @concurrent.

SpeechTranscriber extremely slow (14+ seconds) despite proper locale allocation and optimization
 
 
Q