I ran into exactly this problem when building an audio pipeline that mixes system audio (via ScreenCaptureKit) with microphone input for real-time speech processing.
The core issue is that mainMixerNode is connected to outputNode by default, which routes everything to speakers. You have two approaches:
In manual rendering mode, AVAudioEngine does not play back to hardware — you pull rendered buffers on your own schedule. Enable manual rendering, attach a player node for your SCK audio, connect it to the main mixer, then call renderOffline() to pull mixed audio on demand.
The catch: inputNode does not work in offline mode on macOS. The workaround is to capture mic samples separately (via AVCaptureSession or a tap on a separate realtime engine), then schedule those buffers into a second AVAudioPlayerNode.
Keep the engine in realtime mode but prevent playback by setting mainMixerNode.outputVolume = 0. Then install a tap on mainMixerNode to capture the mixed audio without speaker feedback.
I tried disconnecting mainMixerNode from outputNode entirely, but on some macOS versions (13.x specifically) this causes the engine to stop pulling audio from its inputs. Setting volume to 0 is more reliable across macOS 13–15.
For the sample rate mismatch between SCK output (typically 48kHz) and mic input (sometimes 44.1kHz), let the mixer handle the conversion — connect each source in its native format and set the mixer output format to your target rate.
Topic:
Media Technologies
SubTopic:
Audio
Tags: