Thank you for the response @Frameworks Engineer!
Is there a chance that rate limits will be reduced for background tasks, like Safari Extensions? This would obviously make Foundation Models impossible to use for safari extensions. I can't just tell my users to plug in their device in order to use my extension.
I don't quite understand how such strict, minute-long cooldowns make sense to begin with, since they make it impossible to provide a reliable user experience. At that point you might as well completely forbid the API in background processes with a static check.
As for your suggestions:
The rate limit I described, I'm hitting on power too (see report for sysdiag).
I didn't see much difference (if any) in the limits when using the respond API as opposed to streamResponse.
We are pretty much forced to use streamResponse because we are randomly hitting guardrail violations for even the most innocuous prompts (I think I saw a couple reports about this already, but that's a separate issue). If I use the respond API, it is all or nothing, with streaming, at least I get some of the response before it taps out. Besides, streaming is a much better UX, so I wouldn't want to give up on it even if it wasn't as rate limited (which it currently is), so there has to be another fix.
Crossing my fingers this rate limiting decision gets reversed (or reduced to seconds as opposed to minutes) because it will break a good bunch of perfectly valid use cases, like mine.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models