This post is from the WWDC26 Audio Q&A.
We are using AEC in our voice app and it mostly works. However, when the experience begins we play a greeting through the speaker, and the initial few hundred milliseconds of the greeting are being captured by the inputNode. This is throwing off our ASR/TTS.
For now, we've disabled audio capture while playing audio, but would prefer to be able to capture all audio with echo cancellation working.
Below is some relevant code snippets. Do you have any suggestions to get AEC working more quickly? I've tried a few things like enabling voice processing before setting the audio session to active.
public init() {
recorderNode = engine.inputNode
speakerNode = engine.outputNode
mainMixerNode = engine.mainMixerNode
engine.attach(audioPlayer)
engine.connect(
audioPlayer,
to: mainMixerNode,
format: nil
)
playbackFormat = mainMixerNode.outputFormat(forBus: 0)
}
public func setupAudioSession() async throws(AudioError) {
do {
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(
.playAndRecord,
mode: .voiceChat,
policy: .default,
options: [
.defaultToSpeaker,
.allowBluetoothHFP,
]
)
try audioSession.setActive(true)
} catch {
throw .audioSessionSetupFailed(error)
}
do {
try recorderNode.setVoiceProcessingEnabled(true)
try speakerNode.setVoiceProcessingEnabled(true)
} catch {
throw .enableVoiceProcessingFailed(error)
}
}