This is actually possible, though it requires a different approach than the typical single-AVAudioEngine setup.
The key insight is that iOS allows multiple AVCaptureSession instances to coexist under certain conditions. You can configure two separate audio routes:
-
Use AVCaptureSession with the AirPods as the input device for your speech recognition pipeline. Set the audio session category to .playAndRecord with .allowBluetooth option.
-
For video recording with the built-in mic, use a second AVCaptureSession (or the camera API you are already using). The built-in mic can be explicitly selected as the audio input for this session.
The catch is you need to manage the audio session category carefully. The .mixWithOthers option is essential here — without it, one session will interrupt the other.
Another approach that avoids the dual-session complexity: use a single AVCaptureSession that captures from the built-in mic for video, and run SFSpeechRecognizer (or the new SpeechAnalyzer on macOS 26 / iOS 26) on the same audio buffer. Speech recognition does not need a dedicated audio route — it can process any audio buffer you feed it, including one that is simultaneously being written to a video file.
So the architecture becomes:
- One AVCaptureSession capturing video + built-in mic audio
- Fork the audio buffers in captureOutput delegate: one copy goes to the video writer, the other feeds SFSpeechRecognizer
- Voice commands ("CAPTURE", "STOP") are detected from the speech recognition results
This avoids the Bluetooth routing problem entirely and is much more reliable in practice.