Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Created

Genmoji API — are local edits and non-text usage (e.g., widgets) allowed?
Hi, I'm integrating the Genmoji API (NSAdaptiveImageGlyph) into my app and would like to confirm two things: Local editing of generated Genmoji. After the user creates a Genmoji, can the app apply edits to the resulting image (e.g., pixelation)? The edited image would be stored only on-device within the app and never shared externally. Use outside text contexts. Can a generated Genmoji be used in other parts of the app, such as a home screen widget? Apple's documentation and the WWDC24 session focus on inline text, stickers, and Tapbacks, but I couldn't find explicit guidance on widget or other UI usage. I checked the Human Interface Guidelines, WWDC24 session "Bring expression to your app with Genmoji," and the App Store Review Guidelines, but couldn't find clear answers. Any guidance or pointers would be appreciated. Thanks!
0
0
473
3d
MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)
Product macOS Version macOS 26.4.1 (public release) Hardware Apple M5 Pro, 48 GB unified memory Summary On macOS 26.4.1, the MPS backend consistently reports approximately 40 GiB of “other allocations” on a 48 GB M5 Pro machine, even on a freshly rebooted system with minimal user applications running. This leaves insufficient memory for large GPU tensor operations that previously succeeded on earlier macOS versions. The failure manifests as: RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The “other allocations: 40.17 GiB” value is consistent across reboots and does not change materially when user applications are quit. This suggests macOS 26.4.1 has increased its baseline GPU/unified memory consumption compared to prior releases in a way that is visible to the MPS allocator. Steps to Reproduce Fresh reboot of M5 Pro, 48 GB, macOS 26.4.1 Launch a PyTorch 2.11.0 application using MPS as the compute device Load a large model into MPS memory (~17 GiB, e.g. a VAE encoder in bfloat16) Attempt to allocate an additional ~7.6 GiB workspace tensor for a matrix multiplication operation (torch.bmm) Result: RuntimeError: MPS backend out of memory, with “other allocations” reported at ~40 GiB despite no large user processes holding GPU memory. Expected: The operation should succeed. 17.60 + 7.63 = 25.23 GiB, which is well within the 48 GiB physical memory of the machine. Additional Observations • vm_stat on a clean boot shows ~24 GB of free system RAM before the PyTorch application launches, consistent with normal OS usage. The 40 GiB figure reported by the MPS allocator as “other allocations” does not correspond to identifiable user processes. • The max allowed: 63.65 GiB ceiling reported by MPS exceeds the physical 48 GiB of the machine, suggesting MPS is using a memory limit calculation that does not account for actual physical constraints on unified memory architectures. • macOS 26.4 introduced a related regression (deterministic RuntimeError: MPSGraph does not support tensor dims larger than INT_MAX) in the same MPS buffer stride arithmetic path. That specific error was resolved in 26.4.1, but the OOM regression described here persists. • This operation succeeded on the same hardware under earlier macOS releases. The increased “other allocations” baseline appears to be specific to macOS 26.x. Impact Machine learning workloads that previously ran successfully on 48 GB Apple Silicon machines are failing on macOS 26.4.1 due to this increased baseline GPU memory consumption. Applications using PyTorch MPS, Core ML, and potentially Metal Performance Shaders directly may be affected. Workaround None identified. Reducing application model size or splitting operations into smaller chunks does not resolve the issue because the constraint is in the “other allocations” baseline, not in the application’s own allocations.
1
0
668
3d
VNRecognizeTextRequest .accurate model failing to load
When I try to use VNRecognizeTextRequest in a simple program on apple silicon .accurate works, but when I add the same code to a helper process in a larger project, .accurate doesn’t return any results while only .fast works. This happens on apple silicon machines but not older intel ones. When I call VNRecognizeTextRequest I see the error [Espresso::handle_ex_plan] exception= in the logs along with (TextRecognition) Error loading network 0, -1. And when I catch the exception in lldb and print it I see Null bundleID. In the code, [[NSBundle mainBundle] returns null even though plutil -p on the helper process binary shows an embedded plist, as well as on the process that spawns the helper.
0
0
360
4d
backDeploy SystemLanguageModel.tokenCount
SystemLanguageModel.contextSize is back-deployed, but SystemLanguageModel.tokenCount is not. The custom adapter toolkit ships with a ~2.7MB tokenizer with a ~150,000 vocabulary size, but the LICENSE.rtf exclusively permits it's use for training LoRAs. Is it possible to back-deploy tokenCount or for Apple to permit the use of the tokenizer.model for counting tokens? This is important to avoiding context overflow errors.
0
1
539
1w
Apple managed asset pack for FoundationModels adapter on Testflight does not download (statusUpdates silent)
Hi, I'm stuck distributing a custom FoundationModels adapter as an Apple-hosted managed asset pack via TestFlight. Everything looks correctly configured end to end but the download just never starts and the statusUpdates sequence is silent. Here's my configuration: App Info.plist: <key>BAHasManagedAssetPacks</key><true/> <key>BAUsesAppleHosting</key><true/> <key>BAAppGroupID</key><string>group.com.fiuto.shared</string> Entitlement com.apple.developer.foundation-model-adapter on both the app and the asset downloader extension. The asset downloader extension uses StoreDownloaderExtension , returning SystemLanguageModel.Adapter.isCompatible(assetPack) from shouldDownload , and the app group on app and asset download extension is the same. I have exported the adapter with toolkit 26.0.0, obtaining: adapterIdentifier = fmadapter-FiutoAdapter-1234567 I have packaged the asset pack using xcrun ba-package and uploaded it to App Store Connect via Transporter, and I get the "ready for internal and external testing" state on App Store Connect, and I have uploaded my app build on TestFlight after the asset pack was marked as ready. I used this code: let adapter = try SystemLanguageModel.Adapter(name: "FiutoAdapter") let ids = SystemLanguageModel.Adapter.compatibleAdapterIdentifiers(name: "FiutoAdapter") // ids == ["fmadapter-FiutoAdapter-1234567"] for await status in AssetPackManager.shared.statusUpdates(forAssetPackWithID: ids.first!) { } I expect the download to start and the stream to yield first .began, then .downloading(progress) and .finished. Actually, compatibleAdapterIdentifiers returns the correct ID, the stream is correctly acquired but i get zero events, so no .began/.downloading/.failed/.finished. Important things: I don't get any error in Console as well; I tested this as an internal tester on TestFlight Tested on iPhone 16 Pro, running iOS 26.3.1 - more than 50GB of free space Apple Intelligence is enabled and set in Italian Background downloads are enabled. I've already checked if the adapter identifier matches regex fmadapter-\w+-\w+ , i tried to reinstall the build, rebooting the device, reupload the asset pack, and also checked that the foundation models adapter entitlement is present on both targets. Is there a known way to diagnose why statusUpdates is silent (no log subsystem seems to show why) in this exact configuration? Is there maybe any delay between asset pack approval on App Store Connect and availability to TestFlight internal testers that I do not know of? I've checked other threads for applicable solutions and I've found that this is similar to the symptom reported in this thread: https://developer.apple.com/forums/thread/805140 / (FB20865802) and also i'm internal tester and on stable iOS 26.3.1, so the limitations from this thread: https://developer.apple.com/forums/thread/793565 shouldn't apply. Thanks
2
0
314
2w
SystemLanguageModel.Adapter leaks ~100MB of irrecoverable APFS disk space per call
FoundationModels framework, macOS Tahoe 26.4.1, MacBook Air M4. Loading a LoRA adapter via SystemLanguageModel.Adapter(fileURL:) leaks ~100MB of APFS disk space per invocation. The space is permanently consumed at the APFS block level with no corresponding file. Calls without an adapter show zero space loss. Running ~300 adapter calls in a benchmark loop leaked ~30GB and nearly filled a 500GB drive. The total unrecoverable phantom space is now ~239GB (461GB allocated on Data volume, 222GB visible to du). Reproduction: Build a CLI tool that loads a .fmadapter and runs one generation Measure before/after with df and du: Before: df free = 9.1 GB, du -smx /System/Volumes/Data = 227,519 MB After: df free = 9.0 GB, du -smx /System/Volumes/Data = 227,529 MB df delta: ~100 MB consumed du delta: +10 MB (background system activity) Phantom: ~90 MB -- no corresponding file anywhere on disk Without --adapter (same code, same model): zero space change du was run with sudo -x. Files modified during the call were checked with sudo find -mmin -10 -- only Spotlight DBs, diagnostics logs, and a 7MB InferenceProviderService vocab cache. Nothing accounts for the ~90MB loss. fs_usage shows TGOnDeviceInferenceProviderService writing hundreds of APFS metadata blocks (RdMeta on /dev/disk3) per adapter call. Recovery Mode diagnostics: fsck_apfs -o -y -s: no overallocations, bitmap consistent (118.6M blocks counted = spaceman allocated) fsck_apfs -o -y -T -s: B-tree repair found nothing fsck_apfs -o -y -T -F -s: "error: container keybag (39003576+1): failed to get keybag data: Inappropriate file type or format. Encryption key structures are invalid." No fsck_apfs flag combination reclaims the space. The leaked blocks are validly allocated in the APFS bitmap and referenced in the extent tree, but not associated with any file visible to du, find, stat, or lsof. Has anyone else observed space loss when using SystemLanguageModel.Adapter? If I am missing something obvious, I would love to know.
6
1
523
2w
Will the upcomming Mac Book Pro M6 Max has at least 256GB RAM
Hi Guys, I want to use the newest Mac Book Pro M6 (Max or Ultra) with at least 256GB RAM for AI development. Will my wish may come true? What do you think? One of Apples most advantage here is unified memory and with the privacy first approach, i want to run local modells and show it to my customer just on the macbook. That has much more magic then first plug the power supply for a sparc, connect a network cable and fiddling around. The perfect match would be a Max Book Pro, M6 Ultra, 512GB. But I guess this is just a dream :-(. Please let me know what you think abou that. Thanks
1
0
603
2w
Does using Vision API offline to label a custom dataset for Core ML training violate DPLA?
Hello everyone, I am currently developing a smart camera app for iOS that recommends optimal zoom and exposure values on-device using a custom Core ML model. I am still waiting for an official response from Apple Support, but I wanted to ask the community if anyone has experience with a similar workflow regarding App Review and the DPLA. Here is my training methodology: I gathered my own proprietary dataset of original landscape photos. I generated multiple variants of these photos with different zoom and exposure settings offline on my Mac. I used the CalculateImageAestheticsScoresRequest (Vision framework) via a local macOS command-line tool to evaluate and score each variant. Based on those scores, I labeled the "best" zoom and exposure parameters for each original photo. I used this labeled dataset to train my own independent neural network using PyTorch, and then converted it to a Core ML model to ship inside my app. Since the app uses my own custom model on-device and does not send any user data to a server, the privacy aspect is clear. However, I am curious if using the output of Apple's Vision API strictly offline to label my own dataset could be interpreted as "reverse engineering" or a violation of the Developer Program License Agreement (DPLA). Has anyone successfully shipped an app using a similar knowledge distillation or automated dataset labeling approach with Apple's APIs? Did you face any pushback during App Review? Any insights or shared experiences would be greatly appreciated!
1
0
275
3w
Sharing a Swift port of Gemma 4 for mlx-swift-lm — feedback welcome
Hi all, I've been working on a pure-Swift port of Google's Gemma 4 text decoder that plugs into mlx-swift-lm as a sidecar model registration. Sharing it here in case anyone else hit the same wall I did, and to get feedback from the MLX team and the community before I propose anything upstream. Repo: https://github.com/yejingyang8963-byte/Swift-gemma4-core Why As of mlx-swift-lm 2.31.x, Gemma 4 isn't supported out of the box. The obvious workaround — reusing the Gemma 3 text implementation with a patched config — fails at weight load because Gemma 4 differs from Gemma 3 in several structural places. The chat-template path through swift-jinja 1.x also silently corrupts the prompt, so the model loads but generates incoherent text. What's in the package A from-scratch Swift implementation of the Gemma 4 decoder (Configuration, Layers, Attention, MLP, RoPE, DecoderLayer) Per-Layer Embedding (PLE) support — the shared embedding table that feeds every decoder layer through a gated MLP as a third residual KV sharing across the back half of the decoder, threaded through the forward pass via a donor table with a single global rope offset A custom Gemma4ProportionalRoPE class for the partial-rotation rope type that initializeRope doesn't currently recognize A chat-template bypass that builds the prompt as a literal string with the correct turn markers and encodes via tokenizer.encode(text:), matching Python mlx-lm's apply_chat_template byte-for-byte Measured on iPhone (A-series, 7.4 GB RAM) Model: mlx-community/gemma-4-e2b-it-4bit Warm load: ~6 s Memory after load: 341–392 MB Time to first token (end-to-end, 333-token system prompt): 2.82 s Generation throughput: 12–14 tok/s What I'd love feedback on Is the sidecar registration pattern the right way to extend mlx-swift-lm with new model families, or is there a more idiomatic path I missed? The chat-template bypass works but feels like a workaround. Is the right long-term fix in swift-jinja, in the tokenizer, or somewhere else entirely? Anyone running into the same PLE / KV-sharing issues on other Gemma-family checkpoints? I'd like to make sure the implementation generalizes beyond E2B before tagging a 0.2.0. Happy to open a PR against mlx-swift-lm if the maintainers think any of this belongs upstream. Thanks for reading.
1
0
258
3w
26.4 Foundation Model rejects most topics
I have an iOS app, "Spatial Agents" which ran great in 26.3. It creates dashboards around a topic. It can also decompose a topic into sub-topics, and explore those. All based on web articles and web article headlines. In iOS 26.4 almost every topic - even "MIT Innovation" are rejected with an apology of "I apologize I can not fulfill this request". I've tried softening all my prompts, and I can get only really benign very simple topics to respond, but not anything with any significance. It ran great on lots of topics in 26.3. My published App, is now useless, and all my users are unhappy. HELP!
3
0
407
4w
After loading my custom model - unsupportedTokenizer error
In Oct25, using mlx_lm.lora I created an adapter and a fused model uploaded to Huggingface. I was able to incorporate this model into my SwiftUI app using the mlx package. MLX-libraries 2.25.8. My base LLM was mlx-community/Mistral-7B-Instruct-v0.3-4bit. Looking at LLMModelFactory.swift the current version 2.29.1 the only changes are the addition of a few models. The earlier model was called: pharmpk/pk-mistral-7b-v0.3-4bit The new model is called: pharmpk/pk-mistral-2026-03-29 The base model (mlx-community/Mistral-7B-Instruct-v0.3-4bit.) must still be available. Could the error 'unsupportedTokenizer' be related to changes in the mlx package? I noticed mention of splitting the package into two parts but don't see anything at github. Feeling rather lost. Does anone have any thoguths and/or suggestions. Thanks, David
3
0
411
Mar ’26
CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling
Area: CoreML / Machine Learning Describe the issue: On iOS 26.4, calling MLModel(contentsOf:configuration:) to load an .mlpackage model hangs indefinitely and eventually kills the app via watchdog. The same model loads and runs inference successfully in under 1 second on iOS 26.3.1. The hang occurs inside eort_eo_compiler_compile_from_ir_program (espresso) during on-device AOT recompilation triggered by MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:. A C++ exception (__cxa_throw) is thrown inside libBNNS.dylib during the exception unwind, which then hangs inside __cxxabiv1::dyn_cast_slow and __class_type_info::search_below_dst. Swift's try/catch does not catch this — the exception originates in C++ and the process hangs rather than terminating cleanly. Setting config.computeUnits = .cpuOnly does not resolve the issue. MLE5ProgramLibrary initialises as shared infrastructure regardless of compute units. Steps to reproduce: Create an app with an .mlpackage CoreML model using the MLE5/espresso backend Call MLModel(contentsOf: modelURL, configuration: config) at runtime Run on a device on iOS 26.3.1 — loads successfully in <1 second Update device to iOS 26.4 — hangs indefinitely, app killed by watchdog after 60–745 seconds Expected behaviour: Model loads successfully, or throws a catchable Swift error on failure. Actual behaviour: Process hangs in MLE5ProgramLibrary.lazyInitQueue. App killed by watchdog. No Swift error thrown. Full stack trace at point of hang: Thread 1 Queue: com.apple.coreml.MLE5ProgramLibrary.lazyInitQueue (serial) frame 0: __cxxabiv1::__class_type_info::search_below_dst libc++abi.dylib frame 1: __cxxabiv1::(anonymous namespace)::dyn_cast_slow libc++abi.dylib frame 2: ___lldb_unnamed_symbol_23ab44dd4 libBNNS.dylib frame 23: eort_eo_compiler_compile_from_ir_program espresso frame 24: -[MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:] CoreML frame 25: -[MLE5ProgramLibrary _programLibraryHandleWithForceRespecialization:error:] CoreML frame 26: __44-[MLE5ProgramLibrary prepareAndReturnError:]_block_invoke CoreML frame 27: _dispatch_client_callout libdispatch.dylib frame 28: _dispatch_lane_barrier_sync_invoke_and_complete libdispatch.dylib frame 29: -[MLE5ProgramLibrary prepareAndReturnError:] CoreML frame 30: -[MLE5Engine initWithContainer:configuration:error:] CoreML frame 31: +[MLE5Engine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 32: +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 45: +[MLModel modelWithContentsOfURL:configuration:error:] CoreML frame 46: @nonobjc MLModel.__allocating_init(contentsOf:configuration:) GKPersonalV2 frame 47: MDNA_GaitEncoder_v1_3.__allocating_init(contentsOf:configuration:) frame 48: MDNA_GaitEncoder_v1_3.__allocating_init(configuration:) frame 50: GaitModelInference.loadModel() frame 51: GaitModelInference.init() iOS version: Reproduced on iOS 26.4. Works correctly on iOS 26.3.1. Xcode version: 26.2 Device: iPhone (model used in testing) Model format: .mlpackage
4
0
670
Mar ’26
Unable to use FoundationModels in older app?
Hi, I'm trying to add FoundationModels to an older project but always get the following error: "Unable to resolve 'dependency' 'FoundationModels' import FoundationModels" The error comes and goes while its compiling and then doesn't run the app. I have my target set to 26.0 (and can't go any higher) and am using Xcode 26 (17E192). Is anyone else having this issue? Thanks, Dan Uff
1
0
317
Mar ’26
iOS 26.4: Regressions in Foundation Models
After installing iOS 26.4 the Foundation Models instruction following and tool calling capabilities have been degraded significantly. The model is not usable anymore. Examples: This works: "Is the car plugged in?" This does not work: "Tell me if the car is plugged in" Anything with the work "frunk" (front trunk) triggers Guardrail Violation. Phrases like "Lock Pride" also trigger Guardrail Violation (Pride is the name of the car). Tool calling only works half the time for really obvious things.
3
1
588
Mar ’26
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
3
0
572
Mar ’26
Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
3
0
1.1k
Mar ’26
Request: Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
0
0
172
Mar ’26
Genmoji API — are local edits and non-text usage (e.g., widgets) allowed?
Hi, I'm integrating the Genmoji API (NSAdaptiveImageGlyph) into my app and would like to confirm two things: Local editing of generated Genmoji. After the user creates a Genmoji, can the app apply edits to the resulting image (e.g., pixelation)? The edited image would be stored only on-device within the app and never shared externally. Use outside text contexts. Can a generated Genmoji be used in other parts of the app, such as a home screen widget? Apple's documentation and the WWDC24 session focus on inline text, stickers, and Tapbacks, but I couldn't find explicit guidance on widget or other UI usage. I checked the Human Interface Guidelines, WWDC24 session "Bring expression to your app with Genmoji," and the App Store Review Guidelines, but couldn't find clear answers. Any guidance or pointers would be appreciated. Thanks!
Replies
0
Boosts
0
Views
473
Activity
3d
Does the new API: BNNSGraph support quantization
Hello, I spent some time going through the documentation and videos. I did not see how to implement quantized arithmetic for my neural network using BNNSGraph. Could someone please help me.
Replies
0
Boosts
0
Views
486
Activity
3d
MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)
Product macOS Version macOS 26.4.1 (public release) Hardware Apple M5 Pro, 48 GB unified memory Summary On macOS 26.4.1, the MPS backend consistently reports approximately 40 GiB of “other allocations” on a 48 GB M5 Pro machine, even on a freshly rebooted system with minimal user applications running. This leaves insufficient memory for large GPU tensor operations that previously succeeded on earlier macOS versions. The failure manifests as: RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The “other allocations: 40.17 GiB” value is consistent across reboots and does not change materially when user applications are quit. This suggests macOS 26.4.1 has increased its baseline GPU/unified memory consumption compared to prior releases in a way that is visible to the MPS allocator. Steps to Reproduce Fresh reboot of M5 Pro, 48 GB, macOS 26.4.1 Launch a PyTorch 2.11.0 application using MPS as the compute device Load a large model into MPS memory (~17 GiB, e.g. a VAE encoder in bfloat16) Attempt to allocate an additional ~7.6 GiB workspace tensor for a matrix multiplication operation (torch.bmm) Result: RuntimeError: MPS backend out of memory, with “other allocations” reported at ~40 GiB despite no large user processes holding GPU memory. Expected: The operation should succeed. 17.60 + 7.63 = 25.23 GiB, which is well within the 48 GiB physical memory of the machine. Additional Observations • vm_stat on a clean boot shows ~24 GB of free system RAM before the PyTorch application launches, consistent with normal OS usage. The 40 GiB figure reported by the MPS allocator as “other allocations” does not correspond to identifiable user processes. • The max allowed: 63.65 GiB ceiling reported by MPS exceeds the physical 48 GiB of the machine, suggesting MPS is using a memory limit calculation that does not account for actual physical constraints on unified memory architectures. • macOS 26.4 introduced a related regression (deterministic RuntimeError: MPSGraph does not support tensor dims larger than INT_MAX) in the same MPS buffer stride arithmetic path. That specific error was resolved in 26.4.1, but the OOM regression described here persists. • This operation succeeded on the same hardware under earlier macOS releases. The increased “other allocations” baseline appears to be specific to macOS 26.x. Impact Machine learning workloads that previously ran successfully on 48 GB Apple Silicon machines are failing on macOS 26.4.1 due to this increased baseline GPU memory consumption. Applications using PyTorch MPS, Core ML, and potentially Metal Performance Shaders directly may be affected. Workaround None identified. Reducing application model size or splitting operations into smaller chunks does not resolve the issue because the constraint is in the “other allocations” baseline, not in the application’s own allocations.
Replies
1
Boosts
0
Views
668
Activity
3d
VNRecognizeTextRequest .accurate model failing to load
When I try to use VNRecognizeTextRequest in a simple program on apple silicon .accurate works, but when I add the same code to a helper process in a larger project, .accurate doesn’t return any results while only .fast works. This happens on apple silicon machines but not older intel ones. When I call VNRecognizeTextRequest I see the error [Espresso::handle_ex_plan] exception= in the logs along with (TextRecognition) Error loading network 0, -1. And when I catch the exception in lldb and print it I see Null bundleID. In the code, [[NSBundle mainBundle] returns null even though plutil -p on the helper process binary shows an embedded plist, as well as on the process that spawns the helper.
Replies
0
Boosts
0
Views
360
Activity
4d
backDeploy SystemLanguageModel.tokenCount
SystemLanguageModel.contextSize is back-deployed, but SystemLanguageModel.tokenCount is not. The custom adapter toolkit ships with a ~2.7MB tokenizer with a ~150,000 vocabulary size, but the LICENSE.rtf exclusively permits it's use for training LoRAs. Is it possible to back-deploy tokenCount or for Apple to permit the use of the tokenizer.model for counting tokens? This is important to avoiding context overflow errors.
Replies
0
Boosts
1
Views
539
Activity
1w
Apple managed asset pack for FoundationModels adapter on Testflight does not download (statusUpdates silent)
Hi, I'm stuck distributing a custom FoundationModels adapter as an Apple-hosted managed asset pack via TestFlight. Everything looks correctly configured end to end but the download just never starts and the statusUpdates sequence is silent. Here's my configuration: App Info.plist: <key>BAHasManagedAssetPacks</key><true/> <key>BAUsesAppleHosting</key><true/> <key>BAAppGroupID</key><string>group.com.fiuto.shared</string> Entitlement com.apple.developer.foundation-model-adapter on both the app and the asset downloader extension. The asset downloader extension uses StoreDownloaderExtension , returning SystemLanguageModel.Adapter.isCompatible(assetPack) from shouldDownload , and the app group on app and asset download extension is the same. I have exported the adapter with toolkit 26.0.0, obtaining: adapterIdentifier = fmadapter-FiutoAdapter-1234567 I have packaged the asset pack using xcrun ba-package and uploaded it to App Store Connect via Transporter, and I get the "ready for internal and external testing" state on App Store Connect, and I have uploaded my app build on TestFlight after the asset pack was marked as ready. I used this code: let adapter = try SystemLanguageModel.Adapter(name: "FiutoAdapter") let ids = SystemLanguageModel.Adapter.compatibleAdapterIdentifiers(name: "FiutoAdapter") // ids == ["fmadapter-FiutoAdapter-1234567"] for await status in AssetPackManager.shared.statusUpdates(forAssetPackWithID: ids.first!) { } I expect the download to start and the stream to yield first .began, then .downloading(progress) and .finished. Actually, compatibleAdapterIdentifiers returns the correct ID, the stream is correctly acquired but i get zero events, so no .began/.downloading/.failed/.finished. Important things: I don't get any error in Console as well; I tested this as an internal tester on TestFlight Tested on iPhone 16 Pro, running iOS 26.3.1 - more than 50GB of free space Apple Intelligence is enabled and set in Italian Background downloads are enabled. I've already checked if the adapter identifier matches regex fmadapter-\w+-\w+ , i tried to reinstall the build, rebooting the device, reupload the asset pack, and also checked that the foundation models adapter entitlement is present on both targets. Is there a known way to diagnose why statusUpdates is silent (no log subsystem seems to show why) in this exact configuration? Is there maybe any delay between asset pack approval on App Store Connect and availability to TestFlight internal testers that I do not know of? I've checked other threads for applicable solutions and I've found that this is similar to the symptom reported in this thread: https://developer.apple.com/forums/thread/805140 / (FB20865802) and also i'm internal tester and on stable iOS 26.3.1, so the limitations from this thread: https://developer.apple.com/forums/thread/793565 shouldn't apply. Thanks
Replies
2
Boosts
0
Views
314
Activity
2w
SystemLanguageModel.Adapter leaks ~100MB of irrecoverable APFS disk space per call
FoundationModels framework, macOS Tahoe 26.4.1, MacBook Air M4. Loading a LoRA adapter via SystemLanguageModel.Adapter(fileURL:) leaks ~100MB of APFS disk space per invocation. The space is permanently consumed at the APFS block level with no corresponding file. Calls without an adapter show zero space loss. Running ~300 adapter calls in a benchmark loop leaked ~30GB and nearly filled a 500GB drive. The total unrecoverable phantom space is now ~239GB (461GB allocated on Data volume, 222GB visible to du). Reproduction: Build a CLI tool that loads a .fmadapter and runs one generation Measure before/after with df and du: Before: df free = 9.1 GB, du -smx /System/Volumes/Data = 227,519 MB After: df free = 9.0 GB, du -smx /System/Volumes/Data = 227,529 MB df delta: ~100 MB consumed du delta: +10 MB (background system activity) Phantom: ~90 MB -- no corresponding file anywhere on disk Without --adapter (same code, same model): zero space change du was run with sudo -x. Files modified during the call were checked with sudo find -mmin -10 -- only Spotlight DBs, diagnostics logs, and a 7MB InferenceProviderService vocab cache. Nothing accounts for the ~90MB loss. fs_usage shows TGOnDeviceInferenceProviderService writing hundreds of APFS metadata blocks (RdMeta on /dev/disk3) per adapter call. Recovery Mode diagnostics: fsck_apfs -o -y -s: no overallocations, bitmap consistent (118.6M blocks counted = spaceman allocated) fsck_apfs -o -y -T -s: B-tree repair found nothing fsck_apfs -o -y -T -F -s: "error: container keybag (39003576+1): failed to get keybag data: Inappropriate file type or format. Encryption key structures are invalid." No fsck_apfs flag combination reclaims the space. The leaked blocks are validly allocated in the APFS bitmap and referenced in the extent tree, but not associated with any file visible to du, find, stat, or lsof. Has anyone else observed space loss when using SystemLanguageModel.Adapter? If I am missing something obvious, I would love to know.
Replies
6
Boosts
1
Views
523
Activity
2w
Will the upcomming Mac Book Pro M6 Max has at least 256GB RAM
Hi Guys, I want to use the newest Mac Book Pro M6 (Max or Ultra) with at least 256GB RAM for AI development. Will my wish may come true? What do you think? One of Apples most advantage here is unified memory and with the privacy first approach, i want to run local modells and show it to my customer just on the macbook. That has much more magic then first plug the power supply for a sparc, connect a network cable and fiddling around. The perfect match would be a Max Book Pro, M6 Ultra, 512GB. But I guess this is just a dream :-(. Please let me know what you think abou that. Thanks
Replies
1
Boosts
0
Views
603
Activity
2w
How Is useful AI
I want to introduce how is usefully AI
Replies
0
Boosts
0
Views
111
Activity
3w
Does using Vision API offline to label a custom dataset for Core ML training violate DPLA?
Hello everyone, I am currently developing a smart camera app for iOS that recommends optimal zoom and exposure values on-device using a custom Core ML model. I am still waiting for an official response from Apple Support, but I wanted to ask the community if anyone has experience with a similar workflow regarding App Review and the DPLA. Here is my training methodology: I gathered my own proprietary dataset of original landscape photos. I generated multiple variants of these photos with different zoom and exposure settings offline on my Mac. I used the CalculateImageAestheticsScoresRequest (Vision framework) via a local macOS command-line tool to evaluate and score each variant. Based on those scores, I labeled the "best" zoom and exposure parameters for each original photo. I used this labeled dataset to train my own independent neural network using PyTorch, and then converted it to a Core ML model to ship inside my app. Since the app uses my own custom model on-device and does not send any user data to a server, the privacy aspect is clear. However, I am curious if using the output of Apple's Vision API strictly offline to label my own dataset could be interpreted as "reverse engineering" or a violation of the Developer Program License Agreement (DPLA). Has anyone successfully shipped an app using a similar knowledge distillation or automated dataset labeling approach with Apple's APIs? Did you face any pushback during App Review? Any insights or shared experiences would be greatly appreciated!
Replies
1
Boosts
0
Views
275
Activity
3w
Sharing a Swift port of Gemma 4 for mlx-swift-lm — feedback welcome
Hi all, I've been working on a pure-Swift port of Google's Gemma 4 text decoder that plugs into mlx-swift-lm as a sidecar model registration. Sharing it here in case anyone else hit the same wall I did, and to get feedback from the MLX team and the community before I propose anything upstream. Repo: https://github.com/yejingyang8963-byte/Swift-gemma4-core Why As of mlx-swift-lm 2.31.x, Gemma 4 isn't supported out of the box. The obvious workaround — reusing the Gemma 3 text implementation with a patched config — fails at weight load because Gemma 4 differs from Gemma 3 in several structural places. The chat-template path through swift-jinja 1.x also silently corrupts the prompt, so the model loads but generates incoherent text. What's in the package A from-scratch Swift implementation of the Gemma 4 decoder (Configuration, Layers, Attention, MLP, RoPE, DecoderLayer) Per-Layer Embedding (PLE) support — the shared embedding table that feeds every decoder layer through a gated MLP as a third residual KV sharing across the back half of the decoder, threaded through the forward pass via a donor table with a single global rope offset A custom Gemma4ProportionalRoPE class for the partial-rotation rope type that initializeRope doesn't currently recognize A chat-template bypass that builds the prompt as a literal string with the correct turn markers and encodes via tokenizer.encode(text:), matching Python mlx-lm's apply_chat_template byte-for-byte Measured on iPhone (A-series, 7.4 GB RAM) Model: mlx-community/gemma-4-e2b-it-4bit Warm load: ~6 s Memory after load: 341–392 MB Time to first token (end-to-end, 333-token system prompt): 2.82 s Generation throughput: 12–14 tok/s What I'd love feedback on Is the sidecar registration pattern the right way to extend mlx-swift-lm with new model families, or is there a more idiomatic path I missed? The chat-template bypass works but feels like a workaround. Is the right long-term fix in swift-jinja, in the tokenizer, or somewhere else entirely? Anyone running into the same PLE / KV-sharing issues on other Gemma-family checkpoints? I'd like to make sure the implementation generalizes beyond E2B before tagging a 0.2.0. Happy to open a PR against mlx-swift-lm if the maintainers think any of this belongs upstream. Thanks for reading.
Replies
1
Boosts
0
Views
258
Activity
3w
Apple Swift Replacing Python
This YouTube video is very interesting, discussing Swift's power and its potential to replace Python. Here is the link. https://youtu.be/6ZGlseSqar0?si=pzZVq9FKsveca4kA
Replies
0
Boosts
1
Views
213
Activity
3w
26.4 Foundation Model rejects most topics
I have an iOS app, "Spatial Agents" which ran great in 26.3. It creates dashboards around a topic. It can also decompose a topic into sub-topics, and explore those. All based on web articles and web article headlines. In iOS 26.4 almost every topic - even "MIT Innovation" are rejected with an apology of "I apologize I can not fulfill this request". I've tried softening all my prompts, and I can get only really benign very simple topics to respond, but not anything with any significance. It ran great on lots of topics in 26.3. My published App, is now useless, and all my users are unhappy. HELP!
Replies
3
Boosts
0
Views
407
Activity
4w
After loading my custom model - unsupportedTokenizer error
In Oct25, using mlx_lm.lora I created an adapter and a fused model uploaded to Huggingface. I was able to incorporate this model into my SwiftUI app using the mlx package. MLX-libraries 2.25.8. My base LLM was mlx-community/Mistral-7B-Instruct-v0.3-4bit. Looking at LLMModelFactory.swift the current version 2.29.1 the only changes are the addition of a few models. The earlier model was called: pharmpk/pk-mistral-7b-v0.3-4bit The new model is called: pharmpk/pk-mistral-2026-03-29 The base model (mlx-community/Mistral-7B-Instruct-v0.3-4bit.) must still be available. Could the error 'unsupportedTokenizer' be related to changes in the mlx package? I noticed mention of splitting the package into two parts but don't see anything at github. Feeling rather lost. Does anone have any thoguths and/or suggestions. Thanks, David
Replies
3
Boosts
0
Views
411
Activity
Mar ’26
CoreML MLE5ProgramLibrary AOT recompilation hangs/crashes on iOS 26.4 — C++ exception in espresso IR compiler bypasses Swift error handling
Area: CoreML / Machine Learning Describe the issue: On iOS 26.4, calling MLModel(contentsOf:configuration:) to load an .mlpackage model hangs indefinitely and eventually kills the app via watchdog. The same model loads and runs inference successfully in under 1 second on iOS 26.3.1. The hang occurs inside eort_eo_compiler_compile_from_ir_program (espresso) during on-device AOT recompilation triggered by MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:. A C++ exception (__cxa_throw) is thrown inside libBNNS.dylib during the exception unwind, which then hangs inside __cxxabiv1::dyn_cast_slow and __class_type_info::search_below_dst. Swift's try/catch does not catch this — the exception originates in C++ and the process hangs rather than terminating cleanly. Setting config.computeUnits = .cpuOnly does not resolve the issue. MLE5ProgramLibrary initialises as shared infrastructure regardless of compute units. Steps to reproduce: Create an app with an .mlpackage CoreML model using the MLE5/espresso backend Call MLModel(contentsOf: modelURL, configuration: config) at runtime Run on a device on iOS 26.3.1 — loads successfully in <1 second Update device to iOS 26.4 — hangs indefinitely, app killed by watchdog after 60–745 seconds Expected behaviour: Model loads successfully, or throws a catchable Swift error on failure. Actual behaviour: Process hangs in MLE5ProgramLibrary.lazyInitQueue. App killed by watchdog. No Swift error thrown. Full stack trace at point of hang: Thread 1 Queue: com.apple.coreml.MLE5ProgramLibrary.lazyInitQueue (serial) frame 0: __cxxabiv1::__class_type_info::search_below_dst libc++abi.dylib frame 1: __cxxabiv1::(anonymous namespace)::dyn_cast_slow libc++abi.dylib frame 2: ___lldb_unnamed_symbol_23ab44dd4 libBNNS.dylib frame 23: eort_eo_compiler_compile_from_ir_program espresso frame 24: -[MLE5ProgramLibraryOnDeviceAOTCompilationImpl createProgramLibraryHandleWithRespecialization:error:] CoreML frame 25: -[MLE5ProgramLibrary _programLibraryHandleWithForceRespecialization:error:] CoreML frame 26: __44-[MLE5ProgramLibrary prepareAndReturnError:]_block_invoke CoreML frame 27: _dispatch_client_callout libdispatch.dylib frame 28: _dispatch_lane_barrier_sync_invoke_and_complete libdispatch.dylib frame 29: -[MLE5ProgramLibrary prepareAndReturnError:] CoreML frame 30: -[MLE5Engine initWithContainer:configuration:error:] CoreML frame 31: +[MLE5Engine loadModelFromCompiledArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 32: +[MLLoader _loadModelWithClass:fromArchive:modelVersionInfo:compilerVersionInfo:configuration:error:] CoreML frame 45: +[MLModel modelWithContentsOfURL:configuration:error:] CoreML frame 46: @nonobjc MLModel.__allocating_init(contentsOf:configuration:) GKPersonalV2 frame 47: MDNA_GaitEncoder_v1_3.__allocating_init(contentsOf:configuration:) frame 48: MDNA_GaitEncoder_v1_3.__allocating_init(configuration:) frame 50: GaitModelInference.loadModel() frame 51: GaitModelInference.init() iOS version: Reproduced on iOS 26.4. Works correctly on iOS 26.3.1. Xcode version: 26.2 Device: iPhone (model used in testing) Model format: .mlpackage
Replies
4
Boosts
0
Views
670
Activity
Mar ’26
Unable to use FoundationModels in older app?
Hi, I'm trying to add FoundationModels to an older project but always get the following error: "Unable to resolve 'dependency' 'FoundationModels' import FoundationModels" The error comes and goes while its compiling and then doesn't run the app. I have my target set to 26.0 (and can't go any higher) and am using Xcode 26 (17E192). Is anyone else having this issue? Thanks, Dan Uff
Replies
1
Boosts
0
Views
317
Activity
Mar ’26
iOS 26.4: Regressions in Foundation Models
After installing iOS 26.4 the Foundation Models instruction following and tool calling capabilities have been degraded significantly. The model is not usable anymore. Examples: This works: "Is the car plugged in?" This does not work: "Tell me if the car is plugged in" Anything with the work "frunk" (front trunk) triggers Guardrail Violation. Phrases like "Lock Pride" also trigger Guardrail Violation (Pride is the name of the car). Tool calling only works half the time for really obvious things.
Replies
3
Boosts
1
Views
588
Activity
Mar ’26
Plenty of LanguageModelSession.GenerationError.refusal errors after 26.4 update
Hello! After the 26.4 update I get a huge number of LanguageModelSession.GenerationError.refusal errors when using guided generation Generables for inexplicable reasons. Such errors also occur, if I want to cast a response to boolean by using 'generating: Bool.self'. The explanation generated on the grounds of the error always looks like this: Response(userPrompt: "", duration: 0.230917542, promptTokenCount: Optional(66), responseTokenCount: Optional(11), feedbackAttachment: nil, content: "I apologize, but I cannot fulfill this request.", rawContent: "I apologize, but I cannot fulfill this request.", transcriptEntries: ArraySlice([])) All the prompts and Generables I use are definitely not profane. Before 26.4 such errors on the same prompts and Generables never occurred. The 26.4 update rendered those features unusable to me. Is this a known bug or what am I doing wrong?
Replies
3
Boosts
0
Views
572
Activity
Mar ’26
Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
Replies
3
Boosts
0
Views
1.1k
Activity
Mar ’26
Request: Official One-Click Local LLM Deployment for 2019 Mac Pro (7,1) Dual W6900X
I am a professional user of the 2019 Mac Pro (7,1) with dual AMD Radeon Pro W6900X MPX modules (32GB VRAM each). This hardware is designed for high-performance compute, but it is currently crippled for modern local LLM/AI workloads under Linux due to Apple's EFI/PCIe routing restrictions. Core Issue: rocminfo reports "No HIP GPUs available" when attempting to use ROCm/amdgpu on Linux Apple's custom EFI firmware blocks full initialization of professional GPU compute assets The dual W6900X GPUs have 64GB combined VRAM and high-bandwidth Infinity Fabric Link, but cannot be fully utilized for local AI inference/training My Specific Request: Apple should provide an official, one-click deployable application that enables full utilization of dual W6900X GPUs for local large language model (LLM) inference and training under Linux. This application must: Fully initialize both W6900X GPUs via HIP/ROCm, establishing valid compute contexts Bypass artificial EFI/PCIe routing restrictions that block access to professional GPU resources Provide a stable, user-friendly one-click deployment experience (similar to NVIDIA's AI Enterprise or AMD's ROCm Hub) Why This Matters: The 2019 Mac Pro is Apple's flagship professional workstation, marketed for compute-intensive workloads. Its high-cost W6900X GPUs should not be locked down for modern AI/LLM use cases. An official one-click deployment solution would demonstrate Apple's commitment to professional AI and unlock significant value for professional users. I look forward to Apple's response and a clear roadmap for enabling this critical capability. #MacPro #Linux #ROCm #LocalLLM #W6900X #CoreML
Replies
0
Boosts
0
Views
172
Activity
Mar ’26