Rate limit exceeded when using Foundation Model framework

When I use the FoundationModel framework to generate long text, it will always hit an error.

"Passing along Client rate limit exceeded, try again later in response to ExecuteRequest"

And stop generating.

eg. for the prompt "Write a long story", it will almost certainly hit that error after 17 seconds of generation.

do{
    let session = LanguageModelSession()
    let prompt: String = "Write a long story"
    let response = try await session.respond(to: prompt)
}catch{}

If possible, I want to know how to prevent that error or at least how to handle it.

I encounter the same issues. Also some strange haptics errors?

I tried your prompt on my iPhone 16 Plus + iOS 26 Beta 3 quite a few times, and can't trigger the error...

As my colleague said here, "rate limiting applies when you device is on battery AND when your process is running in the background." Does your code run in the background? If yes, that may explain the error. Otherwise, I’d suggest that you file a feedback report with LanguageModelFeedbackAttachment for the team to take a look.

Also, you mentioned "it will almost certainly hit that error after 17 seconds of generation." Does that mean the models actually worked and finished the generation, and so the error was just a noice?

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

Rate limit exceeded when using Foundation Model framework
 
 
Q