Voice Isolation Suggestions

We are working on a voice app that uses ASR/TTS on the backend and run into some difficulties in noisy environments. We have compiled the DeepFilterNet3 library into an XCFramework and are using that on the app, but it's sometimes a bit ambitious with trimming out noise and removes some of the voice.

Is there any way to make use of Apple's on-device voice isolation mic mode? I see that we can detect the user's mic mode, but we cannot programmatically set it. While we could prompt the user to enable it manually, this adds a bit of friction to the user experience.

Do you have any suggestions for enabling voice isolation, or for performing denoising in general?

Answered by Engineer in 890569022

Mic mode is, by design, entirely under the user's control. There is no API to directly set it, but as you mentioned, there is an API to present a prompt that guides the user to enable it manually. While this does introduce some friction, it keeps the user informed and in control of their privacy settings.

Another option worth considering is AUSoundIsolation, a public Audio Unit that performs on-device voice isolation as part of your audio signal processing pipeline. This approach operates independently of the system mic mode, so it gives you more control over the denoising behavior within your app without requiring any user intervention.

Accepted Answer

Mic mode is, by design, entirely under the user's control. There is no API to directly set it, but as you mentioned, there is an API to present a prompt that guides the user to enable it manually. While this does introduce some friction, it keeps the user informed and in control of their privacy settings.

Another option worth considering is AUSoundIsolation, a public Audio Unit that performs on-device voice isolation as part of your audio signal processing pipeline. This approach operates independently of the system mic mode, so it gives you more control over the denoising behavior within your app without requiring any user intervention.

I've been playing around with similar tools for a personal project recently, so this is just some recent experience to share -

Have you looked at the sound isolation audio unit? I've used it occasionally as a VST plug-in on Mac and the high quality voice mode works quite well, so I wonder if that can be integrated into your app?

https://developer.apple.com/documentation/audiotoolbox/kaudiounitsubtype_ausoundisolation

I actually found that DeepFilterNet2 library worked more reliably for voice separation in heavy noise, having run into some of the same issues you mentioned, so you may also want to consider trying that instead?

Voice Isolation Suggestions
 
 
Q