No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words
but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
LanguageModelSession always returns very lengthy responses
I've tried the same instructions and prompt you provided with different text
, and the issue doesn't happen to me: Every time I tried, the models generated a response no more than 8 words.
I built my code with Xcode 26.0 beta (17A5241e) on macOS 15.5 (24F74), and ran it on my iPhone 16 Plus + 23A5260h. The code is pretty straightforward, and so I am wondering if the test environment has any difference...
Best,
——
Ziqiao Chen
Worldwide Developer Relations.
Hi, I wanted to follow up on this. I'm now on Beta 4 for Xcode and macOS but it's still the same issue. Note that I am trying a RAG approach via Tool Calling as mentioned here:
// https://developer.apple.com/videos/play/wwdc2025/301/?time=124
// https://developer.apple.com/documentation/foundationmodels/expanding-generation-with-tool-calling
var session: LanguageModelSession
session = LanguageModelSession(
tools: [RetrievalTool(retrieval)],
instructions: instructions
)
// https://developer.apple.com/documentation/foundationmodels/generationoptions
let response = try await session.respond(
to: prompt,
options: GenerationOptions.init(maximumResponseTokens: 500)
)
tools
can return a somewhat lengthy document relevant to the prompt but even though the instructions
and maximumResponseTokens
specifies to return a brief response, response
ends up being around the same length of the tools