Using Past Versions of Foundation Models As They Progress

Has Apple made any commitment to versioning the Foundation Models on device? What if you build a feature that works great on 26.0 but they change the model or guardrails in 26.1 and it breaks your feature, is your only recourse filing Feedback or pulling the feature from the app? Will there be a way to specify a model version like in all of the server based LLM provider APIs? If not, sounds risky to build on.

Currently, no, sorry.

The Foundation Models framework currently does not offer any API to determine which version of the base model is running on a person's device. Pease file a Feedback if such an API is something you'd like to see, thanks!

Having no reasonable assurance that a feature I build against these might not stop working outside of my control does freak me out quite a bit, especially since we're already seeing massive changes in what's allowed in the current seeds.

FB18924722

Thanks for filing that enhancement request @Hunter!

In addition to what my colleague pointed out, keep in mind that it's important not to overtune your prompts to a specific version, as the models will indeed be updated over time via OS updates.

One strategy that could help is to constrain output using guided generation to yield more consistent results.

Furthermore, we suggest in this session from WWDC25 that developers evaluate and test responses over time as they update their prompts and as we update the models. This will ensure quality and safety over time.

Best,

-J

Hi J,

Thanks for the reply. A little more info:

  1. In my specific case, I am indeed using guided generation already (which helped a lot in seed 1 and 2).

  2. I thought the eval session was great. I've been working with LLMs for a bit now and it's awesome to have a whole WWDC session to talk about evals and expose them to devs who may not have seen them before.

This is coming up for me now because something changed in seed 3 where my feature went from a ~95% success rate to a 0% success rate, all failing with guardrails errors that did not trigger in the first two seeds. There are a bunch of other threads here on the topic and I've filed several feedbacks on the specifics already.

Maybe that's a bug/unexpected outcome and we'll see a future seed restore the behavior. I hope so, I'd like to ship this feature. But if not, at least I won't have sent it to customers.

My real concern is that in a future point release where the amount of feedback time is compressed and IMO it's very hard to get specific issues in front of engineers via the Feedback system with enough time to get a fix done and tested, this will pop up again. Since the models are non-deterministic, unless you're running my evals, there's a good chance the team may not even know about what's for me a serious regression until it ships.

I really want to use and love the framework because it's so promising but I'm a little freaked out now given what's happening in the current seeds. That's the background for my concern and I don't think there's any way to mitigate that at the moment (right?)

@Hunter Good to know, thanks for that background information. Keep in mind these are indeed beta releases, but I understand your concern.

In addition to the enhancement request feedback you filed, I'd suggest filing a bug report feedback for the guardrails regression you're observing. If possible, attach a focused code sample that reproduces the failure so that our engineers can take a closer look.

Please provide the FB number here as well so I can keep track of it.

-J

Sure, here are a couple, including one that is reproduced with WWDC sample code from Apple.

FB18787534 FB18712543

Also, in case you have not seen this thread, there is additional discussion: https://developer.apple.com/forums/thread/792022

Question - do you know if adapter training could help with guardrails issues? I was assuming that the guardrails either ran in a separate process/mechanism or were deeply embedded enough in the training process that it wouldn't matter but if I'm wrong about that and training a custom adapter could help, that's something I would definitely consider (I have a lot of good data, perfect for this sort of training).

Using Past Versions of Foundation Models As They Progress
 
 
Q