That's what I am not understanding, and maybe this will change with the upcoming model replacement, but from what the world has come to understand, and maybe I don't understand it correctly -- try local first, if local can't, then it should move to PCC, and if PCC can't handle the task, then to the 3rd party extension (currently ChatGPT by default).
It's not really "fall back" but I would think the behavior would be "is the token size larger than 4096? If yes, then move to PCC."
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence