LanguageModelSession always returns very lengthy responses

No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.

I've tried the same instructions and prompt you provided with different text, and the issue doesn't happen to me: Every time I tried, the models generated a response no more than 8 words.

I built my code with Xcode 26.0 beta (17A5241e) on macOS 15.5 (24F74), and ran it on my iPhone 16 Plus + 23A5260h. The code is pretty straightforward, and so I am wondering if the test environment has any difference...

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

Hi, I wanted to follow up on this. I'm now on Beta 4 for Xcode and macOS but it's still the same issue. Note that I am trying a RAG approach via Tool Calling as mentioned here:

          // https://developer.apple.com/videos/play/wwdc2025/301/?time=124
          // https://developer.apple.com/documentation/foundationmodels/expanding-generation-with-tool-calling
          var session: LanguageModelSession
          session = LanguageModelSession(
            tools: [RetrievalTool(retrieval)],
            instructions: instructions
          )

          // https://developer.apple.com/documentation/foundationmodels/generationoptions
          let response = try await session.respond(
            to: prompt, 
            options: GenerationOptions.init(maximumResponseTokens: 500)
          )

tools can return a somewhat lengthy document relevant to the prompt but even though the instructions and maximumResponseTokens specifies to return a brief response, response ends up being around the same length of the tools

LanguageModelSession always returns very lengthy responses
 
 
Q