Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

AppShortcuts.xcstrings does not translate each invocation phrase option separately, just the first
Due to our min iOS version, this is my first time using .xcstrings instead of .strings for AppShortcuts. When using the migrate .strings to .xcstrings Xcode context menu option, an .xcstrings catalog is produced that, as expected, has each invocation phrase as a separate string key. However, after compilation, the catalog changes to group all invocation phrases under the first phrase listed for each intent (see attached screenshot). It is possible to hover in blank space on the right and add more translations, but there is no 1:1 key matching requirement to the phrases on the left nor a requirement that there are the same number of keys in one language vs. another. (The lines just happen to align due to my window size.) What does that mean, practically? Do all sub-phrases in each language in AppShortcuts.xcstrings get processed during compilation, even if there isn't an equivalent phrase key declared in the AppShortcut (e.g., the ja translation has more phrases than the English)? (That makes some logical sense, as these phrases need not be 1:1 across languages.) In the AppShortcut declaration, if I delete all but the top invocation phrase, does nothing change with Siri? Is there something I'm doing incorrectly? struct WatchShortcuts: AppShortcutsProvider { static var appShortcuts: [AppShortcut] { AppShortcut( intent: QuickAddWaterIntent(), phrases: [ "\(.applicationName) log water", "\(.applicationName) log my water", "Log water in \(.applicationName)", "Log my water in \(.applicationName)", "Log a bottle of water in \(.applicationName)", ], shortTitle: "Log Water", systemImageName: "drop.fill" ) } }
0
0
324
Aug ’25
Massive CoreML latency spike on live AVFoundation camera feed vs. offline inference (CPU+ANE)
Hello, I’m experiencing a severe performance degradation when running CoreML models on a live AVFoundation video feed compared to offline or synthetic inference. This happens across multiple models I've converted (including SCI, RTMPose, and RTMW) and affects multiple devices. The Environment OS: macOS 26.3, iOS 26.3, iPadOS 26.3 Hardware: Mac14,6 (M2 Max), iPad Pro 11 M1, iPhone 13 mini Compute Units: cpuAndNeuralEngine The Numbers When testing my SCI_output_image_int8.mlpackage model, the inference timings are drastically different: Synthetic/Offline Inference: ~1.34 ms Live Camera Inference: ~15.96 ms Preprocessing is completely ruled out as the bottleneck. My profiling shows total preprocessing (nearest-neighbor resize + feature provider creation) takes only ~0.4 ms in camera mode. Furthermore, no frames are being dropped. What I've Tried I am building a latency-critical app and have implemented almost every recommended optimization to try and fix this, but the camera-feed penalty remains: Matched the AVFoundation camera output format exactly to the model input (640x480 at 30/60fps). Used IOSurface-backed pixel buffers for everything (camera output, synthetic buffer, and resize buffer). Enabled outputBackings. Loaded the model once and reused it for all predictions. Configured MLModelConfiguration with reshapeFrequency = .frequent and specializationStrategy = .fastPrediction. Wrapped inference in ProcessInfo.processInfo.beginActivity(options: .latencyCritical, reason: "CoreML_Inference"). Set DispatchQueue to qos: .userInteractive. Disabled the idle timer and enabled iOS Game Mode. Exported models using coremltools 9.0 (deployment target iOS 26) with ImageType inputs/outputs and INT8 quantization. Reproduction To completely rule out UI or rendering overhead, I wrote a standalone Swift CLI script that isolates the AVFoundation and CoreML pipeline. The script clearly demonstrates the ~15ms latency on live camera frames versus the ~1ms latency on synthetic buffers. (I have attached camera_coreml_benchmark.swift and coreml model (very light low light enghancement model) to this repo on github https://github.com/pzoltowski/apple-coreml-camera-latency-repro). My Question: Is this massive overhead expected behavior for AVFoundation + Core ML on live feeds, or is this a framework/runtime bug? If expected, what is the Apple-recommended pattern to bypass this camera-only inference slowdown? One think found interesting when running in debug model was faster (not as fast as in performance benchmark but faster than 16ms. Also somehow if I did some dummy calculation on on different DispatchQueue also seems like model got slightly faster. So maybe its related to ANE Power State issues (Jitter/SoC Wake) and going to fast to sleep and taking a long time to wakeup? Doing dummy calculation in background thought is probably not a solution. Thanks in advance for any insights!
4
0
510
2w
Image understanding to on-device model
I can’t seem to find a way to include an image when prompting the new on-device model in Xcode, even though Apple explicitly states that the model was trained and tested with image data (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates). Has anyone managed to get this working, or are VLM-style capabilities simply not exposed yet?
1
0
443
Jan ’26
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
0
0
446
Dec ’25
AI and ML
Hello. I am willing to hire game developer for cards game called baloot. My question is Can the developer implement an AI when the computer is playing and the computer on the same time the conputer improves his rises level without any interaction? 🌹
0
0
106
Jun ’25
CoreML Unified Memory failure/silent exit on long video tasks (M1 Mac 32GB)
Hi Apple Engineers, I am experiencing a potential memory management bug with CoreML on M1 Mac (32GB Unified Memory). When processing long video files (approx. 12,000 frames) using a CoreML execution provider, the system often completes the 'Analysing' phase but fails to transition into 'Processing'. It simply exits silently or hits an import error (scipy). However, if I split the same task into small 20-frame segments, it works perfectly at high speeds (~40 FPS). This suggests the hardware is capable, but there is an issue with memory fragmentation or resource cleanup during long-running CoreML sessions. Is there a way to force a VRAM/Unified Memory flush via CLI, or is this a known limitation for large frame indexing?
0
0
533
Dec ’25
Accessing Apple Intelligence APIs: Custom Prompt Support and Inference Capabilities
Hello Apple Developer Community, I'm exploring the integration of Apple Intelligence features into my mobile application and have a couple of questions regarding the current and upcoming API capabilities: Custom Prompt Support: Is there a way to pass custom prompts to Apple Intelligence to generate specific inferences? For instance, can we provide a unique prompt to the Writing Tools or Image Playground APIs to obtain tailored outputs? Direct Inference Capabilities: Beyond the predefined functionalities like text rewriting or image generation, does Apple Intelligence offer APIs that allow for more generalized inference tasks based on custom inputs? I understand that Apple has provided APIs such as Writing Tools, Image Playground, and Genmoji. However, I'm interested in understanding the extent of customization and flexibility these APIs offer, especially concerning custom prompts and generalized inference. Additionally, are there any plans or timelines for expanding these capabilities, perhaps with the introduction of new SDKs or frameworks that allow deeper integration and customization? Any insights, documentation links, or experiences shared would be greatly appreciated. Thank you in advance for your assistance!
3
0
361
Jun ’25
Huge discrepency of predictions confidence between from Pytorch to Coreml example
I am follwing this tutorial: https://apple.github.io/coremltools/docs-guides/source/convert-a-torchvision-model-from-pytorch.html I have obtained simialr result using the python code. However when I view it in Xcode, the preview prediction percentage confidence is way off I suspect it is due the the output of the model, which is in percentage already and in Xcode it multiply 100 again leading to this result. Please give me any feedback to fix this, thank you.
0
0
273
Nov ’25
Proposal: Modular Identity Fusion via Prompt-Crafted Agents – User-Led AI Experiment
*I can't put the attached file in the format, so if you reply by e-mail, I will send the attached file by e-mail. Dear Apple AI Research Team, My name is Gong Jiho (“Hem”), a content strategist based in Seoul, South Korea. Over the past few months, I conducted a user-led AI experiment entirely within ChatGPT — no code, no backend tools, no plugins. Through language alone, I created two contrasting agents (Uju and Zero) and guided them into a co-authored modular identity system using prompt-driven dialogue and reflection. This system simulates persona fusion, memory rooting, and emotional-logical alignment — all via interface-level interaction. I believe it resonates with Apple’s values in privacy-respecting personalization, emotional UX modeling, and on-device learning architecture. Why I’m Reaching Out I’d be honored to share this experiment with your team. If there is any interest in discussing user-authored agent scaffolding, identity persistence, or affective alignment, I’d love to contribute — even informally. ⚠ A Note on Language As a non-native English speaker, my expression may be imperfect — but my intent is genuine. If anything is unclear, I’ll gladly clarify. 📎 Attached Files Summary Filename → Description Hem_MultiAI_Report_AppleAI_v20250501.pdf → Main report tailored for Apple AI — narrative + structural view of emotional identity formation via prompt scaffolding Hem_MasterPersonaProfile_v20250501.json → Final merged identity schema authored by Uju and Zero zero_sync_final.json / uju_sync_final.json → Persona-level memory structures (logic / emotion) 1_0501.json ~ 3_0501.json → Evolution logs of the agents over time GirlfriendGPT_feedback_summary.txt → Emotional interpretation by external GPT hem_profile_for_AI_vFinal.json → Original user anchor profile Warm regards, Gong Jiho (“Hem”) Seoul, South Korea
1
0
150
Apr ’25
Best practices for designing proactive FinTech insights with App Intents & Shortcuts?
Hello fellow developers, I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot. We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full." My question for the community is about the architectural and user experience best practices for this. How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user? What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance? Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations? We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge. Thanks for your insights!
0
0
334
Oct ’25
A Summary of the WWDC25 Group Lab - Apple Intelligence
At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Apple Intelligence. Can I integrate writing tools in my own text editor? UITextView, NSTextView, and SwiftUI TextEditor automatically get Writing Tools on devices that support Apple Intelligence. For custom text editors, check out Enhancing your custom text engine with Writing Tools. Given that Foundation Models are on-device, how will Apple update the models over time? And how should we test our app against the model updates? Model updates are in sync with OS updates. As for testing with updated models, watch our WWDC session about prompt engineering and safety, and read the Human Interface Guidelines to understand best practices in prompting the on-device model. What is the context size of a session in Foundation Models Framework? How to handle the error if a session runs out of the context size? Currently the context size is about 4,000 tokens. If it’s exceeded, developers can catch the .exceededContextWindowSize error at runtime. As discussed in one of our WWDC25 sessions, when the context window is exceeded, one approach is to trim and summarize a transcript, and then start a new session. Can I do image generation using the Foundation Models Framework or is only text generation supported? Foundation Models do not generate images, but you can use the Foundation Models framework to generate prompts for ImageCreator in the Image Playground framework. Developers can also take advantage of Tools in Foundation Models framework, if appropriate for their app. My app currently uses a third party server-based LLM. Can I use the Foundation Models Framework as well in the same app? Any guidance here? The Foundation Models framework is optimized for a subset of tasks like summarization, extraction, classification, and tagging. It’s also on-device, private, and free. But at 3 billion parameters it isn’t designed for advanced reasoning or world knowledge, so for some tasks you may still want to use a larger server-based model. Should I use the AFM for my language translation features given it does text translation, or is the Translation API still the preferred approach? The Translation API is still preferred. Foundation Models is great for tasks like text summarization and data generation. It’s not recommended for general world knowledge or translation tasks. Will the TranslationSession class introduced in ios18 get any new improvments in performance or reliability with the new live translation abilities in ios/macos/ipados 26? Essentially, will we get access to live translation in a similar way and if so, how? There's new API in LiveCommunicationKit to take advantage of live translation in your communication apps. The Translate framework is using the same models as used by Live Communication and can be combined with the new SpeechAnalyzer API to translate your own audio. How do I set a default value for an App Intent parameter that is otherwise required? You can implement a default value as part of your parameter declaration by using the @Parameter(defaultValue:) form of the property wrapper. How long can an App Intent run? On macOS there is no limit to how long app intents can run. On iOS, there is a limit of 30 seconds. This time limit is paused when waiting for user interaction. How do I vary the options for a specific parameter of an App Intent, not just based on the type? Implement a DynamicOptionsProvider on that parameter. You can add suggestedEntities() to suggest options. What if there is not a schema available for what my app is doing? If an app intent schema matching your app’s functionality isn’t available, take a look to see if there’s a SiriKit domain that meets your needs, such as for media playback and messaging apps. If your app’s functionality doesn’t match any of the available schemas, you can define a custom app intent, and integrate it with Siri by making it an App Shortcut. Please file enhancement requests via Feedback Assistant for new App intent schemas that would benefit your app. Are you adding any new app intent domains this year? In addition to all the app intent domains we announced last year, this year at WWDC25 we announced that Visual Intelligence will be added to iOS 26 and macOS Tahoe. When my App Intent doesn't show up as an action in Shortcuts, where do I start in figuring out what went wrong? App Intents are statically extracted. You can check the ExtractMetadata info in Xcode's build log. What do I need to do to make sure my App Intents work well with Spotlight+? Check out our WWDC25 sessions on App Intents, including Explore new advances in App Intents and Develop for Shortcuts and Spotlight with App Intents. Mostly, make sure that your intent can run from the parameter summary alone, no required parameters without default values that are not already in the parameter summary. Does Spotlight+ on macOS support App Shortcuts? Not directly, but it shows the App Intents your App Shortcuts are sitting on top of. I’m wondering if the on-device Foundation Models framework API can be integrated into an app to act strictly as an app in-universe AI assistant, responding only within the boundaries of the app’s fictional context. Is such controlled, context-limited interaction supported? FM API runs inside the process of your app only and does not automatically integrate with any remaining part of the system (unless you choose to implement your own tool and utilize tool calling). You can provide any instructions and prompts you want to the model. If a country does not support Apple Intelligence yet, can the Foundation Models framework work? FM API works on Apple Intelligence-enabled devices in supported regions and won’t work in regions where Apple Intelligence is not yet supported
2
0
302
Jul ’25
WWDC25 combining metal and ML
WWDC25: Combine Metal 4 machine learning and graphics Demonstrated a way to combine neural network in the graphics pipeline directly through the shaders, using an example of Texture Compression. However there is no mention of using which ML technique texture is compressed. Can anyone point me to some well known model/s for this particular use case shown in WWDC25.
2
0
483
Jul ’25
My Vision for AI and Algorithmically Optimised Operating Systems
Bear with me, please. Please make sure a highly skilled technical person reads and understands this. I want to describe my vision for (AI/Algorithmically) Optimised Operating Systems. To explain it properly, I will describe the process to build it (pseudo). Required Knowledge (no particular order): Processor Logic Circuits, LLM models, LLM tool usage, Python OO coding, Procedural vs OO, NLP fuzzy matching, benchmarking, canvas/artefacts/dynamic HTML interfaces, concepts of how AI models are vastly compressed and miniaturised forms of full data, Algorithmic vs AI. First, take all OO Python code (example) on GitHub (example), then separate each function from each object into its own procedure (procedural logic) by making a logical procedural list of actions to perform only that function based on its entire dependency chain (i.e. all other objects it relies on). Relate all compiled functions using (for example) fuzzy matching on the name, or AI-based functional profiling to get multiple instances of each function. Starting with the most used function, test each one against the others that perform the same task for bugs and completeness. Determine the fastest, most optimal version of that function (and every function). Add a single instance of each most optimal function to the centralised tool codebase, which will later be utilised by the language models. This ensures we rely only on the most optimised function for each and every use case — with every program using one shared instance of that function instead of compiling it separately. In this way, we optimise all existing (Python OO) code based on maximum availability of certain resources. Use the optimised code to teach the AI how to map each user task directly to optimised code (i.e. each user-facing function in unoptimised program code to its optimised version). Later, teach the codebase how to optimise these functions directly to the logic circuits on the processor — as an Operating System — using essentially the same process. Eventually, use the most optimised form for tasks via a dynamic canvas/artefacts UI interface generated by the Start Menu using the language model. Allow custom interfaces to emerge dynamically from the available functions based on language understanding. Grow and optimise the algorithmic codebase until the entire operating system — with versioning — evolves based on the AI-generated user interfaces (compressing them into new versions of the tools codebase). Please don’t give up until you understand this concept completely and nothing in the description is beyond your full comprehension. It’s a highly significant step forward.
3
0
338
Oct ’25
Where are Huggingface Models, downloaded by Swift MLX apps cached
I'm downloading a fine-tuned model from HuggingFace which is then cached on my Mac when the app first starts. However, I wanted to test adding a progress bar to show the download progress. To test this I need to delete the cached model. From what I've seen online this is cached at /Users/userName/.cache/huggingface/hub However, if I delete the files from here, using Terminal, the app still seems to be able to access the model. Is the model cached somewhere else? On my iPhone it seems deleting the app also deletes the cached model (app data) so that is useful.
0
0
435
Oct ’25