This is the gap everyone building on PCC needs answered. Based on what was discussed at the WWDC26 ML labs, here's the practical reality:
When you cross the free PCC threshold, Apple hasn't published a paid fallback program — which means you can't rely on PCC scaling automatically for a commercial app at volume. The safe architecture today is to build the fallback in before you ship, not after you hit the limit.
The pattern that holds up: abstract your inference behind a protocol so the model tier is swappable. Route to on-device first where the task allows it (free, no threshold), use PCC for the cases that genuinely need it, and have a third-party provider (via the Language Model protocol) configured as the overflow path for when you exceed PCC eligibility. The new Language Model protocol makes this swap close to a one-line change.
The architectural principle: don't let PCC be a single point of failure your business depends on with no defined cost ceiling. Treat it as one tier in a routing strategy, not the foundation.
I've actually been building exactly this routing layer as an open-source package — happy to share patterns if useful.
— Divya Ravi, Senior iOS Enginee