Co channel interference resolution

Hello there,

We are looking at resolving audio co channel interference resolution... Googles Gemini is saying ...

Data Volume: A production-grade model capable of cleanly untangling mixed broadcast channels requires between 5,000 and 10,000 hours of verified audio data to map out diverse vocal characteristics and varying signal strengths.

Synthetic Data Generation: Rather than recording manual interference loops, the dataset can be entirely synthetic. Clean speech profiles are digitally mixed using Python pipelines that emulate a GroupTalk channel environment. This includes applying standard codecs, simulating varying packet loss concealment (PLC) artifacts, and injecting real-world environmental noise (like bridge wind or engine room hum).

Do you believe that ?

Warmest regards,

Ken

Hello @KenZakreski,

Is your question related to the LiveCommunicationKit and Push-To-Talk features announced at this WWDC26?

For questions about other frameworks, you might want to make a dedicated post in that topic.

For tips on creating a forum post see Tips on Creating a Forum Post!

If we are mistaken, please rephrase your question.

Thank you for your post!

 Travis

Thank you for your reply, let me rephrase the question...

Subject: Inquiry Regarding Architectural Overhead and Buffer Access in the Push to Talk Framework for Real-Time Core ML Blind Source Separation

Dear Apple Engineering Team, We are currently developing an Apple-native communication platform that utilizes the Push to Talk framework alongside Core ML to handle real-time, on-device audio processing. We are working to resolve the issue of single-channel, co-channel interference (overlapping voice streams) directly on the edge.

Our current challenge lies in the pipeline latency and background lifecycle constraints when intercepting incoming audio buffers. To cleanly separate overlapping voices before they hit the audio output mixer, we need to process the raw PCM data immediately upon arrival. Could you please provide guidance on the following architectural questions:

Low-Latency Buffer Interception: What is the recommended design pattern within the PTChannelManagerDelegate flow to pass raw incoming audio buffers directly to a Core ML model running on the Apple Neural Engine (ANE) before the system routes them to AVAudioEngine for playback?

Background Thread Management: Given the strict background execution boundaries enforced by the Push to Talk framework, how can we best optimize thread scheduling to ensure our speech separation model completes its execution without triggering an OS background processing timeout or process termination?

Dynamic UI Manifestation: Once a combined audio stream is separated into two clean, distinct voice vectors on-device, what is the best approach for registering multiple PTParticipant states simultaneously so that the native system UI (like the Dynamic Island) accurately reflects both speakers?

Thank you for your time, insights, and continued support of developer innovation within the iOS and iPadOS ecosystems.

Best regards, Ken Zakreski Founder, Marine Link Pro

Please see our previous response to your question.

https://developer.apple.com/forums/thread/830458?answerId=890688022#890688022

If you have any follow up questions, feel free to continue the conversation there for conciseness.

If you have questions tying this into the LiveCommunicationKit and Push-To-Talk Q&A features announced at WWDC26, we have the relevant engineering team here to answer questions!

Thank you!

 Travis

Thank you.

Apple’s Push to Talk (PTT) framework is designed to provide a highly power-efficient, secure, and system-integrated foundation for walkie-talkie style applications.

Wanting to provide a superior app experience. Can we implement language translation (for example Dutch (or Frisian) to English on edge) over these frameworks?

Warmest Regards,

Ken

Can we implement language translation (for example Dutch (or Frisian) to English on edge) over these frameworks?

Your app has full control over what's recorded and played back, so how you process that audio is entirely up to you. For example, I believe there's at least one PTT app that actually sends text (not audio) and uses Text to Speech for playback.

__
Kevin Elliott
DTS Engineer, CoreOS/Hardware

Co channel interference resolution
 
 
Q