I've been playing around with similar tools for a personal project recently, so this is just some recent experience to share -
Have you looked at the sound isolation audio unit? I've used it occasionally as a VST plug-in on Mac and the high quality voice mode works quite well, so I wonder if that can be integrated into your app?
https://developer.apple.com/documentation/audiotoolbox/kaudiounitsubtype_ausoundisolation
I actually found that DeepFilterNet2 library worked more reliably for voice separation in heavy noise, having run into some of the same issues you mentioned, so you may also want to consider trying that instead?
Topic:
Audio
SubTopic:
Audio Q&A