Hybrid assistant architecture (on-device model + server tools)

We run a conversational assistant where answers depend on live API data, not just static knowledge. What is Apple’s recommended split between on-device Foundation Models (intent, routing, summarization, privacy-sensitive context) and server-side tool execution? Is there an official pattern for a local planner with a remote executor?

Hybrid assistant architecture (on-device model + server tools)
 
 
Q