I'm implementing an App Intent for my iOS app that helps users plan trip activities. It only works when run as a shortcut but not using voice through Siri. There are 2 issues:
The ShortcutsTripEntity will only accept a voice input for a specific trip but not others.
I'm stuck with a throwing error when trying to use requestDisambiguation() on the activity day @Parameter property.
How do I rectify these issues.
This is blocking me from completing a critical feature that lets users quickly plan activities through Siri and Shortcuts.
Expected behavior for trip input: The intent should make Siri accept the spoken trip input from any of the options.
Actual behavior for trip input: Siri only accepts the same trip when spoken but accepts any when selected by click/touch.
Expected behavior for day input: Siri should accept the spoken selected option.
Actual behavior for day input: Siri only accepts an input by click/touch but yet throws an error at runtime I'm happy to provide more code. But here's the relevant code:
struct PlanActivityTestIntent: AppIntent {
@Parameter(title: "Activity Day")
var activityDay: ShortcutsItineraryDayEntity
@Parameter(
title: "Trip",
description: "The trip to plan an activity for",
default: ShortcutsTripEntity(id: UUID().uuidString, title: "Untitled trip"),
requestValueDialog: "Which trip would you like to add an activity to?"
)
var tripEntity: ShortcutsTripEntity
@Parameter(title: "Activity Title", description: "The title of the activity", requestValueDialog: "What do you want to do or see?")
var title: String
@Parameter(title: "Activity Day", description: "Activity Day", default: ShortcutsItineraryDayEntity(itineraryDay: .init(itineraryId: UUID(), date: .now), timeZoneIdentifier: "UTC"))
var activityDay: ShortcutsItineraryDayEntity
func perform() async throws -> some ProvidesDialog {
// ...other code...
let tripsStore = TripsStore()
// load trips and map them to entities
try? await tripsStore.getTrips()
let tripsAsEntities = tripsStore.trips.map { trip in
let id = trip.id ?? UUID()
let title = trip.title
return ShortcutsTripEntity(id: id.uuidString, title: title, trip: trip)
}
// Ask user to select a trip. This line would doesn't accept a voice // answer. Why?
let selectedTrip = try await $tripEntity.requestDisambiguation(
among: tripsAsEntities,
dialog: .init(
full: "Which of the \(tripsAsEntities.count) trip would you like to add an activity to?",
supporting: "Select a trip",
systemImageName: "safari.fill"
)
)
// This line throws an error
let selectedDay = try await $activityDay.requestDisambiguation(
among: daysAsEntities,
dialog:"Which day would you like to plan an activity for?"
)
}
}
Here are some related images that might help:
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
We are really excited to have introduced the Foundation Models framework in WWDC25. When using the framework, you might have feedback about how it can better fit your use cases.
Starting in macOS/iOS 26 Beta 4, the best way to provide feedback is to use #Playground in Xcode. To do so:
In Xcode, create a playground using #Playground. Fore more information, see Running code snippets using the playground macro.
Reproduce the issue by setting up a session and generating a response with your prompt.
In the canvas on the right, click the thumbs-up icon to the right of the response.
Follow the instructions on the pop-up window and submit your feedback by clicking Share with Apple.
Another way to provide your feedback is to file a feedback report with relevant details. Specific to the Foundation Models framework, it’s super important to add the following information in your report:
Language model feedback
This feedback contains the session transcript, including the instructions, the prompts, the responses, etc. Without that, we can’t reason the model’s behavior, and hence can hardly take any action.
Use logFeedbackAttachment(sentiment:issues:desiredOutput: ) to retrieve the feedback data of your current model session, as shown in the usage example, write the data into a file, and then attach the file to your feedback report.
If you believe what you’d report is related to the system configuration, please capture a sysdiagnose and attach it to your feedback report as well.
The framework is still new. Your actionable feedback helps us evolve the framework quickly, and we appreciate that.
Thanks,
The Foundation Models framework team
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
My app used app intents. And when user said "Prüfung der Bluetooth Funktion", screen can show the whole words. But in my app, it only can get "Bluetooth Funktion". This behaviour only happened in German version. In English version, everything worked well.
Is anyone can support me? Why German version siri cut my words?
In this WWDC25 session, it is explictely mentioned that apps should support AttributedString for text parameters to their App Intents.
However, I have not gotten this to work. Whenever I pass rich text (either generated by the new "Use Model" intent or generated manually for example using "Make Rich Text from Markdown"), my Intent gets an AttributedString with the correct characters, but with all attributes stripped (so in effect just plain text).
struct TestIntent: AppIntent {
static var title = LocalizedStringResource(stringLiteral: "Test Intent")
static var description = IntentDescription("Tests Attributed Strings in Intent Parameters.")
@Parameter
var text: AttributedString
func perform() async throws -> some IntentResult & ReturnsValue<AttributedString> {
return .result(value: text)
}
}
Is there anything else I am missing?
:
Hello, I’m seeking clarification on whether Apple provides any framework or API that enables deep integration between Siri and advanced AI assistants (such as ChatGPT), including system-level functions like voice interaction, navigation, cross-platform syncing, and operational access similar to Siri’s own capabilities. If no such option exists today, I would appreciate guidance on the recommended path or approved third-party solutions for building a unified, voice-first experience across Apple’s ecosystem. Thank you for your time and insight.
Hi all,
I'm trying to find out if/when we can expect mxfp8/mxfp4 support on Apple Silicon. I've noticed that mlx now has casting data types, but all computation is still done in bf16. Would be great to reduce power consumption with support for these lower precision data types since edge inference is already typically done at a lower precision!
Thanks in advance.
Topic:
Machine Learning & AI
SubTopic:
Core ML
The documentation for the Create ML tool ("Building an object detector data source") mentions that there are options for using normalized values instead of pixels and also different anchor point origins ("MLBoundingBoxCoordinatesOrigin") instead of always using "center". However, the JSON format for these does not appear in any examples. Does anyone know the format for these options?
Topic:
Machine Learning & AI
SubTopic:
Create ML
Hello, I have to create an app in Swift that it scan NFC Identity card. It extract data and convert it to human readable data. I do it with below code
import CoreNFC
class NFCIdentityCardReader: NSObject , NFCTagReaderSessionDelegate {
func tagReaderSessionDidBecomeActive(_ session: NFCTagReaderSession) {
print("\(session.description)")
}
func tagReaderSession(_ session: NFCTagReaderSession, didInvalidateWithError error: any Error) {
print("NFC Error: \(error.localizedDescription)")
}
var session: NFCTagReaderSession?
func beginScanning() {
guard NFCTagReaderSession.readingAvailable else {
print("NFC is not supported on this device")
return
}
session = NFCTagReaderSession(pollingOption: .iso14443, delegate: self, queue: nil)
session?.alertMessage = "Hold your NFC identity card near the device."
session?.begin()
}
func tagReaderSession(_ session: NFCTagReaderSession, didDetect tags: [NFCTag]) {
guard let tag = tags.first else {
session.invalidate(errorMessage: "No tag detected")
return
}
session.connect(to: tag) { (error) in
if let error = error {
session.invalidate(errorMessage: "Connection error: \(error.localizedDescription)")
return
}
switch tag {
case .miFare(let miFareTag):
self.readMiFareTag(miFareTag, session: session)
case .iso7816(let iso7816Tag):
self.readISO7816Tag(iso7816Tag, session: session)
case .iso15693, .feliCa:
session.invalidate(errorMessage: "Unsupported tag type")
@unknown default:
session.invalidate(errorMessage: "Unknown tag type")
}
}
}
private func readMiFareTag(_ tag: NFCMiFareTag, session: NFCTagReaderSession) {
// Read from MiFare card, assuming it's formatted as an identity card
let command: [UInt8] = [0x30, 0x04] // Example: Read command for block 4
let requestData = Data(command)
tag.sendMiFareCommand(commandPacket: requestData) { (response, error) in
if let error = error {
session.invalidate(errorMessage: "Error reading MiFare: \(error.localizedDescription)")
return
}
let readableData = String(data: response, encoding: .utf8) ?? response.map { String(format: "%02X", $0) }.joined()
session.alertMessage = "ID Card Data: \(readableData)"
session.invalidate()
}
}
private func readISO7816Tag(_ tag: NFCISO7816Tag, session: NFCTagReaderSession) {
let selectAppCommand = NFCISO7816APDU(instructionClass: 0x00, instructionCode: 0xA4, p1Parameter: 0x04, p2Parameter: 0x00, data: Data([0xA0, 0x00, 0x00, 0x02, 0x47, 0x10, 0x01]), expectedResponseLength: -1)
tag.sendCommand(apdu: selectAppCommand) { (response, sw1, sw2, error) in
if let error = error {
session.invalidate(errorMessage: "Error reading ISO7816: \(error.localizedDescription)")
return
}
let readableData = response.map { String(format: "%02X", $0) }.joined()
session.alertMessage = "ID Card Data: \(readableData)"
session.invalidate()
}
}
}
But I got null. I think that these data are encrypted. How can I convert them to readable data without MRZ, is it possible ?
I need to get personal informations from Identity card via Core NFC.
Thanks in advance.
Best regards
I am writing a custom package wrapping Foundation Models which provides a chain-of-thought with intermittent self-evaluation among other things. At first I was designing this package with the command line in mind, but after seeing how well it augments the models and makes them more intelligent I wanted to try and build a SwiftUI wrapper around the package.
When I started I was using synchronous generation rather than streaming, but to give the best user experience (as I've seen in the WWDC sessions) it is necessary to provide constant feedback to the user that something is happening.
I have created a super simplified example of my setup so it's easier to understand.
First, there is the Reasoning conversation item, which can be converted to an XML representation which is then fed back into the model (I've found XML works best for structured input)
public typealias ConversationContext = XMLDocument
extension ConversationContext {
public func toPlainText() -> String {
return xmlString(options: [.nodePrettyPrint])
}
}
/// Represents a reasoning item in a conversation, which includes a title and reasoning content.
/// Reasoning items are used to provide detailed explanations or justifications for certain decisions or responses within a conversation.
@Generable(description: "A reasoning item in a conversation, containing content and a title.")
struct ConversationReasoningItem: ConversationItem {
@Guide(description: "The content of the reasoning item, which is your thinking process or explanation")
public var reasoningContent: String
@Guide(description: "A short summary of the reasoning content, digestible in an interface.")
public var title: String
@Guide(description: "Indicates whether reasoning is complete")
public var done: Bool
}
extension ConversationReasoningItem: ConversationContextProvider {
public func toContext() -> ConversationContext {
// <ReasoningItem title="${title}">
// ${reasoningContent}
// </ReasoningItem>
let root = XMLElement(name: "ReasoningItem")
root.addAttribute(XMLNode.attribute(withName: "title", stringValue: title) as! XMLNode)
root.stringValue = reasoningContent
return ConversationContext(rootElement: root)
}
}
Then there is the generator, which creates a reasoning item from a user query and previously generated items:
struct ReasoningItemGenerator {
var instructions: String {
"""
<omitted for brevity>
"""
}
func generate(from input: (String, [ConversationReasoningItem])) async throws -> sending LanguageModelSession.ResponseStream<ConversationReasoningItem> {
let session = LanguageModelSession(instructions: instructions)
// build the context for the reasoning item out of the user's query and the previous reasoning items
let userQuery = "User's query: \(input.0)"
let reasoningItemsText = input.1.map { $0.toContext().toPlainText() }.joined(separator: "\n")
let context = userQuery + "\n" + reasoningItemsText
let reasoningItemResponse = try await session.streamResponse(
to: context, generating: ConversationReasoningItem.self)
return reasoningItemResponse
}
}
I'm not sure if returning LanguageModelSession.ResponseStream<ConversationReasoningItem> is the right move, I am just trying to imitate what session.streamResponse returns.
Then there is the orchestrator, which I can't figure out. It receives the streamed ConversationReasoningItems from the Generator and is responsible for streaming those to SwiftUI later and also for evaluating each reasoning item after it is complete to see if it needs to be regenerated (to keep the model on-track). I want the users of the orchestrator to receive partially generated reasoning items as they are being generated by the generator. Later, when they finish, if the evaluation passes, the item is kept, but if it fails, the reasoning item should be removed from the stream before a new one is generated. So in-flight reasoning items should be outputted aggresively.
I really am having trouble figuring this out so if someone with more knowledge about asynchronous stuff in Swift, or- even better- someone who has worked on the Foundation Models framework could point me in the right direction, that would be awesome!
I got 3203.23 GFLOPS (FP16) on the M3 Macbook Pro and only 2833.24 GFLOPS (FP16) on the M4 Macbook Air for 4096x4096 matrix multiplications for a PyTorch MPS FP16 Benchmark. Wasn't the performance supposed to be twice as high on the M4 compared to the M3 even with the termal throtling on the Macbook Air? What went wrong?
Does anyone know if ExecuTorch is officially supported or has been successfully used on visionOS? If so, are there any specific build instructions, example projects, or potential issues (like sandboxing or memory limitations) to be aware of when integrating it into an Xcode project for the Vision Pro?
While ExecuTorch has support for iOS, I can't find any official documentation or community examples specifically mentioning visionOS.
Thanks.
When calling NLTagger.requestAssets with some languages, it hangs indefinitely both in the simulator and a device. This happens consistently for some languages like greek. An example call is NLTagger.requestAssets(for: .greek, tagScheme: .lemma). Other languages like french return immediately. I captured some logs from Console and found what looks like the repeated attempts to download the asset. I would expect the call to eventually terminate, either loading the asset or failing with an error.
I want to get depth map that when camera zoom in or zoom out or switch to telephoto.
I have got the depth map using ARkit that provide depth map that the colored RGB image from the wide-range camera and the depth ratings from the LiDAR scanner are fused together.
Now I want to switch camera to telephoto and hope to get new depth map.
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence
Greetings,
Ive been exerimenting with the new Apple intelligence chat. I want to be able to use my custom LLM and I made that work (I can chat back and forward from the left panel with my server) but I cannot find out how to change the editor contents like chatgpt does.
chatgpt is able to change the current editor and, seems like, all files in the pbx. I tried to catch the call with charles with no success.
In the OpenIA platform docs it doesnt mention anything that could change the code shown.
does anyone know how to achieve this? Is the apple intelliece documentation lacking this features and will it be completed soon? will this features even be open for developers?
Is foundation models matured enough to take input from the Apple Vision framework to generate responses? Something similar to what google's gemini does although in a much smaller scale and for a very specific niche.
Hello,
I was successfully able to compile TKDKid1000/TinyLlama-1.1B-Chat-v0.3-CoreML using Core ML, and it's working well. However, I’m now trying to compile the same model using Swift Transformers.
With the limited documentation available on the swift-chat and Hugging Face repositories, I’m finding it difficult to understand the correct process for compiling a model via Swift Transformers. I attempted the following approach, but I’m fairly certain it’s not the recommended or correct method.
Could someone guide me on the proper way to compile and use models like TinyLlama with Swift Transformers? Any official workflow, example, or best practice would be very helpful.
Thanks in advance!
This is the approach I have used:
import Foundation
import CoreML
import Tokenizers
@main
struct HopeApp {
static func main() async {
print(" Running custom decoder loop...")
do {
let tokenizer = try await AutoTokenizer.from(pretrained: "PY007/TinyLlama-1.1B-Chat-v0.3")
var inputIds = tokenizer("this is the test of the prompt")
print("🧠 Prompt token IDs:", inputIds)
let model = try float16_model(configuration: .init())
let maxTokens = 30
for _ in 0..<maxTokens {
let input = try MLMultiArray(shape: [1, 128], dataType: .int32)
let mask = try MLMultiArray(shape: [1, 128], dataType: .int32)
for i in 0..<inputIds.count {
input[i] = NSNumber(value: inputIds[i])
mask[i] = 1
}
for i in inputIds.count..<128 {
input[i] = 0
mask[i] = 0
}
let output = try model.prediction(input_ids: input, attention_mask: mask)
let logits = output.logits // shape: [1, seqLen, vocabSize]
let lastIndex = inputIds.count - 1
let lastLogitsStart = lastIndex * 32003 // vocab size = 32003
var nextToken = 0
var maxLogit: Float32 = -Float.greatestFiniteMagnitude
for i in 0..<32003 {
let logit = logits[lastLogitsStart + i].floatValue
if logit > maxLogit {
maxLogit = logit
nextToken = i
}
}
inputIds.append(nextToken)
if nextToken == 32002 { break }
let partialText = try await tokenizer.decode(tokens:inputIds)
print(partialText)
}
} catch {
print("❌ Error: \(error)")
}
}
}
Topic:
Machine Learning & AI
SubTopic:
Core ML
Hello, I am thinking of buying the MacBook Pro 14" with M4 Pro for ML/AI/ NLP tasks mostly. And since I have only used Windows before, I am wandering if it is compatible with libraries like "Pytorch" and "TensorFlow" etc., or people have experienced problems in installation... Thank you!
Topic:
Machine Learning & AI
SubTopic:
General
Hi everyone,
I'm developing an iOS app using Foundation Models and I've hit a critical limitation that I believe affects many developers and millions of users.
The Issue
Foundation Models requires the device system language to be one of the supported languages. If a user has their device set to an unsupported language (Catalan, Dutch, Swedish, Polish, Danish, Norwegian, Finnish, Czech, Hungarian, Greek, Romanian, and many others), SystemLanguageModel.isSupported returns false and the framework is completely unavailable.
Why This Is Problematic
Scenario: A Catalan user has their iPhone in Catalan (native language). They want to use an AI chat app in Spanish or English (languages they speak fluently).
Current situation:
❌ Foundation Models: Completely unavailable
✅ OpenAI GPT-4: Works perfectly
✅ Anthropic Claude: Works perfectly
✅ Any cloud-based AI: Works perfectly
The user must choose between:
Keep device in Catalan → Cannot use Foundation Models at all
Change entire device to Spanish → Can use Foundation Models but terrible UX
Impact
This affects:
Millions of users in regions where unsupported languages are official
Multilingual users who prefer their device in their native language but can comfortably interact with AI in English/Spanish
Developers who cannot deploy Foundation Models-based apps in these markets
Privacy-conscious users who are ironically forced to use cloud AI instead of on-device AI
What We Need
One of these solutions would solve the problem:
Option 1: Per-app language override (preferred)
// Proposed API
let session = try await LanguageModelSession(preferredLanguage: "es-ES")
Option 2: Faster rollout of additional languages (particularly EU languages)
Option 3: Allow fallback to user-selected supported language when system language is unsupported
Technical Details
Current behavior:
// Device in Catalan
let isAvailable = SystemLanguageModel.isSupported
// Returns false
// No way to override or specify alternative language
Why This Matters
Apple Intelligence and Foundation Models are amazing for privacy and performance. But this language restriction makes the most privacy-focused AI solution less accessible than cloud alternatives. This seems contrary to Apple's values of accessibility and user choice.
Questions for the Community
Has anyone else encountered this limitation?
Are there any workarounds I'm missing?
Has anyone successfully filed feedback about this?(Please share FB number so we can reference it)
Are there any sessions or labs where this has been discussed?
Thanks for reading. I'd love to hear if others are facing this and how you're handling it.
I installed Xcode 26.0 beta and downloaded the generative models sample from here:
https://developer.apple.com/documentation/foundationmodels/adding-intelligent-app-features-with-generative-models
But when I run it in the iOS 26.0 simulator, I get the error shown here. What's going wrong?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi everyone,
I'm working on an iOS app that uses VisionKit and I'm exploring the .visualLookUp feature. Specifically, I want to extract the detailed information that Visual Look Up provides after identifying an object in an image (e.g., if the object is a flower, retrieve its name; if it’s a clothing tag, get the tag's content).