Memory Attribution for Foundation Models in iOS 26

Hi,

I’m developing an app targeting iOS 26, using the new FoundationModels framework to perform on-device LLM inference. I’m currently testing memory usage.

Does the memory used by FoundationModels—including model weights, KV cache, and any inference-related buffers—count toward my app’s Jetsam memory limit, or is any of it managed separately by the system?

I may need to run two concurrent inferences, each with a 4096-token context window. Is this explicitly supported or allowed by FoundationModels on iOS 26? Would this significantly increase the risk of memory-based termination?

Thanks in advance for any clarification. Thanks.

Answered by Frameworks Engineer in 851307022

The on-device foundation model and the inference resources are managed centrally by the operating system and shared by all Apple Intelligence system features, so the increase to your app's memory usage will be very minimal.

The Foundation Models API fully supports creating multiple sessions and making requests in parallel, but the actual execution is managed by the OS. Because running inference is compute-intensive and consumes power, your parallel requests (e.g. your calls to respond from parallel tasks) will actually end up running serially on the Apple Neural Engine.

Accepted Answer

The on-device foundation model and the inference resources are managed centrally by the operating system and shared by all Apple Intelligence system features, so the increase to your app's memory usage will be very minimal.

The Foundation Models API fully supports creating multiple sessions and making requests in parallel, but the actual execution is managed by the OS. Because running inference is compute-intensive and consumes power, your parallel requests (e.g. your calls to respond from parallel tasks) will actually end up running serially on the Apple Neural Engine.

Memory Attribution for Foundation Models in iOS 26
 
 
Q