Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Created

How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
0
0
230
Mar ’26
AI framework usage without user session
We are evaluating various AI frameworks to use within our code, and are hoping to use some of the build-in frameworks in macOS including CoreML and Vision. However, we need to use these frameworks in a background process (system extension) that has no user session attached to it. (To be pedantic, we'll be using an XPC service that is spawned by the system extension, but neither would have an associated user session). Saying the daemon-safe frameworks list has not been updated in a while is an understatement, but it's all we have to go on. CoreGraphics isn't even listed--back then it part of ApplicationServices (I think?) and ApplicationServices is a no go. Vision does use CoreGraphics symbols and data types so I have doubts. We do have a POC that uses both frameworks and they seem to function fine but obviously having something official is better. Any Apple engineers that can comment on this?
6
0
1.1k
Mar ’26
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
2
0
313
Mar ’26
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
0
0
153
Mar ’26
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
1
0
462
Mar ’26
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
489
Mar ’26
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
535
Mar ’26
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
0
0
355
Mar ’26
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
0
0
696
Mar ’26
Provide spoken voice search string
Hello, My goal is to enable users to perform a freeform search request for any product I sell using a spoken phrase, for example, "Hey Siri, search GAMING CONSOLES on MyCatalogApp". The result would launch MyCatalogApp and navigate to a search results page displaying gaming consoles. I have defined a SearchIntent (using the .system.search schema) and a Shortcut to accomplish this. However, Siri doesn't seem to be able to correctly parse the spoken phrase, extract the search string, and provide it as the critiria term within SearchIntent. What am I doing wrong? Here is the SearchIntent. Note the print() statement outputs the search string--which in the scenario above would be "GAMING CONSOLES"--but it doesn't work. import AppIntents @available(iOS 17.2, *) @AppIntent(schema: .system.search) struct SearchIntent: ShowInAppSearchResultsIntent { static var searchScopes: [StringSearchScope] = [.general] @Parameter(title: "Criteria") var criteria: StringSearchCriteria static var title: LocalizedStringResource = "Search with MyCatalogApp" @MainActor func perform() async throws -> some IntentResult { let searchString = criteria.term print("**** Search String: \(searchString) ****") // tmp debugging try await MyCatalogSearchHelper.search(for: searchString) // fetch results via my async fetch API return .result() } } Here's the Shortcuts definition: import AppIntents @available(iOS 17.2, *) struct Shortcuts: AppShortcutsProvider { @AppShortcutsBuilder static var appShortcuts: [AppShortcut] { AppShortcut( intent: SearchIntent(), phrases: ["Search for \(\.$criteria) on \(.applicationName)."], shortTitle: "Search", systemImageName: "magnifyingglass" ) } } Thanks for any help!
1
0
765
Mar ’26
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
2
0
531
Mar ’26
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
2
0
698
Mar ’26
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
0
0
473
Mar ’26
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
0
0
352
Mar ’26
Two errors in debug: com.apple.modelcatalog.catalog sync and nw_protocol_instance_set_output_handler
We get two error message in Xcode debug. apple.model.catalog we get 1 time at startup, and the nw_protocol_instance_set_output_handler Not calling remove_input_handler on 0x152ac3c00:udp we get on sartup and some time during running of the app. I have tested cutoff repos WS eg. But nothing helpss, thats for the nw_protocol. We have a fondationmodel in a repo but we check if it is available if not we do not touch it. Please help me? nw_protocol_instance_set_output_handler Not calling remove_input_handler on 0x152ac3c00:udp com.apple.modelcatalog.catalog sync: connection error during call: Error Domain=NSCocoaErrorDomain Code=4099 "The connection to service named com.apple.modelcatalog.catalog was invalidated: Connection init failed at lookup with error 159 - Sandbox restriction." UserInfo={NSDebugDescription=The connection to service named com.apple.modelcatalog.catalog was invalidated: Connection init failed at lookup with error 159 - Sandbox restriction.} reached max num connection attempts: 1 The function we have in the repo is this: public actor FoundationRepo: JobDescriptionChecker, SubskillSuggester { private var session: LanguageModelSession? private let isEnabled: Bool private let shouldUseLocalFoundation: Bool private let baseURLString = "https://xx.xx.xxx/xx" private let http: HTTPPac public init(http: HTTPPac, isEnabled: Bool = true) { self.http = http self.isEnabled = isEnabled self.session = nil guard isEnabled else { self.shouldUseLocalFoundation = false return } let model = SystemLanguageModel.default guard model.supportsLocale() else { self.shouldUseLocalFoundation = false return } switch model.availability { case .available: self.shouldUseLocalFoundation = true case .unavailable(.deviceNotEligible), .unavailable(.appleIntelligenceNotEnabled), .unavailable(.modelNotReady): self.shouldUseLocalFoundation = false @unknown default: self.shouldUseLocalFoundation = false } } So here we decide if we are going to use iPhone ML or my backend-remote?
2
0
549
Mar ’26
tensorflow-metal ReLU activation fails to clip negative values on M4 Apple Silicon
Environment: Hardware: Mac M4 OS: macOS Sequoia 15.7.4 TensorFlow-macOS Version: 2.16.2 TensorFlow-metal Version: 1.2.0 Description: When using the tensorflow-metal plug-in for GPU acceleration on M4, the ReLU activation function (both as a layer and as an activation argument) fails to correctly clip negative values to zero. The same code works correctly when forced to run on the CPU. Reproduction Script: import os import numpy as np import tensorflow as tf # weights and biases = -1 weights = [np.ones((10, 5)) * -1, np.ones(5) * -1] # input = 1 data = np.ones((1, 10)) # comment this line => GPU => get negative values # uncomment this line => CPU => no negative values # tf.config.set_visible_devices([], 'GPU') # create model model = tf.keras.Sequential([ tf.keras.layers.Input(shape=(10,)), tf.keras.layers.Dense(5, activation='relu') ]) # set weights model.layers[0].set_weights(weights) # get output output = model.predict(data) # check if negative is present print(f"min value: {output.min()}") print(f"is negative present? {np.any(output < 0)}")
2
0
498
Mar ’26
Apple Intelligence Naughty Naughty
When doing some exploratory research into using Apple Intelligence in our aviation-focused application, I noticed that there were several times that key phases would be marked as inappropriate. I tried to stifle these using prompts and rules but couldn't get it to take hold. I was encouraged by an Apple employee to go ahead and post this so that the AI team can use the feedback. There were several terms that triggered this warning, but the two that were most prominent were: 'Tailwind' 'JFK' or 'KJFK' (NY airport ICAO/IATA codes)
2
0
696
Mar ’26
Handling exceedingContextWindowSizeError
Reading all the docs(1) I was under the impression that handling this error is well managed... Until I hit it and found out that the recommended handling options hide a crucial fact: in the catch block you can not do anything?! It's too late - everything is lost, no way to recover... All the docs mislead me that I can apply the Transcript trick in the catch block until I realised, that there is nothing there !!! This article here(2) enlightened me on the handling of this problem, but I must say (and the author as well) - this is a hack! So my questions: is there really no way to handle this exception properly? if not, can we have the most important information - the count of the context exposed through the official API (at least the known ones)? https://developer.apple.com/documentation/Technotes/tn3193-managing-the-on-device-foundation-model-s-context-window#Handle-the-exceeding-context-window-size-error-elegantly https://zats.io/blog/making-the-most-of-apple-foundation-models-context-window/
1
0
259
Mar ’26
How does ARKit achieve low-latency and stable head tracking using only RGB camera ?
Hi, I’m working on a real-time head/face tracking pipeline using a standard 2D RGB camera, and I’m trying to better understand how ARKit achieves such stable and responsive results in comparable conditions. To clarify upfront: I’m specifically interested in RGB-only tracking and the underlying vision/ML pipeline. I’m not using TrueDepth or any depth/IR-based sensors, and I’d like to understand how similar stability and responsiveness can be achieved under those constraints. In my current setup, I estimate head pose from RGB frames (facial landmarks + PnP) and apply temporal filtering (e.g., exponential smoothing and Kalman filtering). This significantly reduces jitter, but introduces noticeable latency, especially during faster head movements. What stands out in ARKit is that it appears to maintain both: Very low jitter Very low perceived latency even when operating with camera input alone. I’m trying to understand what techniques might contribute to this behavior. In particular: Does ARKit use predictive tracking (e.g., velocity or acceleration-based pose extrapolation) to compensate for camera and processing delays in RGB-only scenarios? Are there recommended strategies for balancing temporal smoothing and responsiveness without introducing visible lag in camera-based pose estimation pipelines? Is the tracking pipeline internally decoupled from rendering (e.g., asynchronous processing with prediction applied at render time)? Are there general best practices for minimizing end-to-end latency in vision-based head tracking systems beyond standard filtering approaches? I understand that implementation details may not be public, but any high-level insights or pointers would be greatly appreciated. Thanks!
Replies
0
Boosts
0
Views
230
Activity
Mar ’26
AI framework usage without user session
We are evaluating various AI frameworks to use within our code, and are hoping to use some of the build-in frameworks in macOS including CoreML and Vision. However, we need to use these frameworks in a background process (system extension) that has no user session attached to it. (To be pedantic, we'll be using an XPC service that is spawned by the system extension, but neither would have an associated user session). Saying the daemon-safe frameworks list has not been updated in a while is an understatement, but it's all we have to go on. CoreGraphics isn't even listed--back then it part of ApplicationServices (I think?) and ApplicationServices is a no go. Vision does use CoreGraphics symbols and data types so I have doubts. We do have a POC that uses both frameworks and they seem to function fine but obviously having something official is better. Any Apple engineers that can comment on this?
Replies
6
Boosts
0
Views
1.1k
Activity
Mar ’26
MPS SDPA Attention Kernel Regression on A14-class (M1) in macOS 26.3.1 — Works on A15+ (M2+)
Summary Since macOS 26, our Core ML / MPS inference pipeline produces incorrect results on Mac mini M1 (Macmini9,1, A14-class SoC). The same model and code runs correctly on M2 and newer (A15-class and up). The regression appears to be in the Scaled Dot-Product Attention (SDPA) kernel path in the MPS backend. Environment Affected Mac mini M1 — Macmini9,1 (A14-class) Not affected M2 and newer (A15-class and up) Last known good macOS Sequoia First broken macOS 26 (Tahoe) ? Confirmed broken on macOS 26.3.1 Framework Core ML + MPS backend Language C++ (via CoreML C++ API) Description We ship an audio processing application (VoiceAssist by NoiseWorks) that runs a deep learning model (based on Demucs architecture) via Core ML with the MPS compute unit. On macOS Sequoia this works correctly on all Apple Silicon Macs including M1. After updating to macOS 26 (Tahoe), inference on M1 Macs fails — either producing garbage output or crashing. The same binary, same .mlpackage, same inputs work correctly on M2+. Our Apple contact has suggested the root cause is a regression in the A14-specific MPS SDPA attention kernel, which may have broken when the Metal/MPS stack was updated in macOS 26. The model makes heavy use of attention layers, and the failure correlates precisely with the SDPA path being exercised on A14 hardware. Steps to Reproduce Load a Core ML model that uses Scaled Dot-Product Attention (e.g. a transformer or attention-based audio model) Run inference with MLComputeUnits::cpuAndGPU (MPS active) Run on Mac mini M1 (Macmini9,1) with macOS 26.3.1 Compare output to the same model running on M2 / macOS Sequoia Expected: Correct inference output, consistent with M2+ and macOS Sequoia behavior Actual: Incorrect / corrupted output (or crash), only on A14-class hardware running macOS 26+ Workaround Forcing MLComputeUnits::cpuOnly bypasses MPS entirely and produces correct output on M1, confirming the issue is in the MPS compute path. This is not acceptable as a shipping workaround due to performance impact. Additional Notes The failure is hardware-specific (A14 only) and OS-specific (macOS 26+), pointing to a kernel-level regression rather than a model or app bug We first became aware of this through a customer report Happy to provide a symbolicated crash log if helpful this text was summarized by AI and human verified
Replies
2
Boosts
0
Views
313
Activity
Mar ’26
Powermetrics GPU power vs system DC power discrepancy on M4 Max
While analyzing system power on an M4 Max under GPU-heavy compute workloads, I noticed that the the GPU power reported by powermetrics does not come anywhere close to total system DC power reported by the SMC counter PDTR (as used by utilities like mactop). For example, in a heavy GPU workload, powermetrics would report a 65W idle-load delta on the GPU, but at the same time system DC power would rise by 179W, leaving 114W or nearly 2/3 of total system DC power on a Mac Studio M4 Max unexplained. From measurements, the difference appears to correlate with the amount of on-chip data movement (for example, varying bytes-per-FLOP in the workload changes the observed gap). Using SMC and IOReport, I was able to reverse engineer an energy model for the GPU that explains almost all of the energy flow with less than 2% error on the workload I studied. The result is a simple two-term energy roofline model: P_GPU (GPU_combined term in the plot) ≈ a * bytes + b * FLOPs with: ~5 pJ/byte for SRAM movement ~2.7 pJ/FLOP for compute. Has anyone observed similar behavior, or is there guidance on how GPU power reported by IOReport/powermetrics should be interpreted relative to total system power? In particular, I’m interested in whether certain classes of GPU activity may not be attributed to the GPU component in IOReport. Full details with the methodology and results are available here: https://youtu.be/HKxIGgyeISM
Replies
0
Boosts
0
Views
153
Activity
Mar ’26
Programmatic image creation using ImageCreator
Hello, Could you please provide details for maximum string length of the prompt and the title when using ImageCreator and the method extracted(from:title:)? static func extracted( from text: String, title: String? = nil ) -> ImagePlaygroundConcept Any additional details or example of prompt and title would help. Additionally, are ImagePlaygroundStyle.animation, ImagePlaygroundStyle.illustration and ImagePlaygroundStyle.sketch all available when using extracted(from:title:)? I am trying to generate images programmatically and would appreciate your guidance. Thank you.
Replies
1
Boosts
0
Views
462
Activity
Mar ’26
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
489
Activity
Mar ’26
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
535
Activity
Mar ’26
Ideal and Largest RDMA Burst Width
In macOS Tahoe 26.2 an RDMA capability was added for Thunderbolt-5 interfaces. This has been demonstrated to significantly decrease the latency and maintain bandwidth for "clustered" Apple Silicon devices with TB5. What is the ideal and the maximum RDMA burst width for transfers over RDMA-enabled Thunderbolt-5 interfaces?
Replies
4
Boosts
0
Views
503
Activity
Mar ’26
Building a 4-agent autonomous coding pipeline on Apple Silicon — MLX backend questions
Hi, I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline running on local hardware with local LLM inference. No middleware, pure Node.js native. Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon. Three technical questions: How production-ready is mlx-lm's OpenAI-compatible API server for long context generation (32K tokens)? What's the recommended approach for KV Cache management with Unified Memory architecture — any specific flags or configurations for M4 Ultra? MLX vs GGUF (llama.cpp) for a multi-agent pipeline where 4 agents call the inference endpoint concurrently — which handles parallel requests better on Apple Silicon? GitHub: github.com/trgysvc/AutonomousNativeForge Any guidance appreciated.
Replies
0
Boosts
0
Views
355
Activity
Mar ’26
Best approach for animating a speaking avatar in a macOS/iOS SwiftUI application
I am developing a macOS application using SwiftUI (with an iOS version as well). One feature we are exploring is displaying an avatar that reads or speaks dynamically generated text produced by an AI service. The basic flow would be: Text generated by an AI service Text converted to speech using a TTS engine An avatar (2D or 3D) rendered in the app that animates lip movement synchronized with the speech Ideally the avatar would render locally on the device. Questions: What Apple frameworks would be most appropriate for implementing a speaking avatar? SceneKit RealityKit SpriteKit (for 2D avatars) Is there any recommended way to drive lip-sync animation from speech audio using Apple frameworks? Does AVSpeechSynthesizer expose phoneme or viseme timing information that could be used for avatar animation? If such timing information is not available, what is the recommended approach for synchronizing character mouth animation with speech audio on macOS/iOS? Are there examples of real-time character animation synchronized with speech on macOS/iOS? Any architectural guidance or references would be greatly appreciated.
Replies
0
Boosts
0
Views
696
Activity
Mar ’26
Provide spoken voice search string
Hello, My goal is to enable users to perform a freeform search request for any product I sell using a spoken phrase, for example, "Hey Siri, search GAMING CONSOLES on MyCatalogApp". The result would launch MyCatalogApp and navigate to a search results page displaying gaming consoles. I have defined a SearchIntent (using the .system.search schema) and a Shortcut to accomplish this. However, Siri doesn't seem to be able to correctly parse the spoken phrase, extract the search string, and provide it as the critiria term within SearchIntent. What am I doing wrong? Here is the SearchIntent. Note the print() statement outputs the search string--which in the scenario above would be "GAMING CONSOLES"--but it doesn't work. import AppIntents @available(iOS 17.2, *) @AppIntent(schema: .system.search) struct SearchIntent: ShowInAppSearchResultsIntent { static var searchScopes: [StringSearchScope] = [.general] @Parameter(title: "Criteria") var criteria: StringSearchCriteria static var title: LocalizedStringResource = "Search with MyCatalogApp" @MainActor func perform() async throws -> some IntentResult { let searchString = criteria.term print("**** Search String: \(searchString) ****") // tmp debugging try await MyCatalogSearchHelper.search(for: searchString) // fetch results via my async fetch API return .result() } } Here's the Shortcuts definition: import AppIntents @available(iOS 17.2, *) struct Shortcuts: AppShortcutsProvider { @AppShortcutsBuilder static var appShortcuts: [AppShortcut] { AppShortcut( intent: SearchIntent(), phrases: ["Search for \(\.$criteria) on \(.applicationName)."], shortTitle: "Search", systemImageName: "magnifyingglass" ) } } Thanks for any help!
Replies
1
Boosts
0
Views
765
Activity
Mar ’26
New project with new AppIntent throws build error
I opened a new project, iOS app, in XCode and then tabbed into the system_search snippet and built the project and got a build error. I can't imagine this was intended, at least not for new developers to the ecosystem like me. I solved it by tweaking a configuration I don't really understand advised here: https://github.com/apple/swift-openapi-generator/issues/796, hopefully that's a valid workaround
Replies
2
Boosts
0
Views
531
Activity
Mar ’26
AppIntent search schema opens app as only option
I am trying to use @AppIntent(schema: .system.search) to search in my app via a Siri voice command, but I want to be able to return a .result that does not open the app, yet still get the model training benefits from the schema. Very new to this, this is my first app, so I would appreciate some guidance. I haven't gotten to the voice part, I tested on Shortcuts. Do I need to do AppIntents without the schema and wait until there is a search schema that does not open the app, or should I be using a different schema? What am I missing?
Replies
2
Boosts
0
Views
698
Activity
Mar ’26
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
Replies
0
Boosts
0
Views
473
Activity
Mar ’26
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
Replies
0
Boosts
0
Views
352
Activity
Mar ’26
Two errors in debug: com.apple.modelcatalog.catalog sync and nw_protocol_instance_set_output_handler
We get two error message in Xcode debug. apple.model.catalog we get 1 time at startup, and the nw_protocol_instance_set_output_handler Not calling remove_input_handler on 0x152ac3c00:udp we get on sartup and some time during running of the app. I have tested cutoff repos WS eg. But nothing helpss, thats for the nw_protocol. We have a fondationmodel in a repo but we check if it is available if not we do not touch it. Please help me? nw_protocol_instance_set_output_handler Not calling remove_input_handler on 0x152ac3c00:udp com.apple.modelcatalog.catalog sync: connection error during call: Error Domain=NSCocoaErrorDomain Code=4099 "The connection to service named com.apple.modelcatalog.catalog was invalidated: Connection init failed at lookup with error 159 - Sandbox restriction." UserInfo={NSDebugDescription=The connection to service named com.apple.modelcatalog.catalog was invalidated: Connection init failed at lookup with error 159 - Sandbox restriction.} reached max num connection attempts: 1 The function we have in the repo is this: public actor FoundationRepo: JobDescriptionChecker, SubskillSuggester { private var session: LanguageModelSession? private let isEnabled: Bool private let shouldUseLocalFoundation: Bool private let baseURLString = "https://xx.xx.xxx/xx" private let http: HTTPPac public init(http: HTTPPac, isEnabled: Bool = true) { self.http = http self.isEnabled = isEnabled self.session = nil guard isEnabled else { self.shouldUseLocalFoundation = false return } let model = SystemLanguageModel.default guard model.supportsLocale() else { self.shouldUseLocalFoundation = false return } switch model.availability { case .available: self.shouldUseLocalFoundation = true case .unavailable(.deviceNotEligible), .unavailable(.appleIntelligenceNotEnabled), .unavailable(.modelNotReady): self.shouldUseLocalFoundation = false @unknown default: self.shouldUseLocalFoundation = false } } So here we decide if we are going to use iPhone ML or my backend-remote?
Replies
2
Boosts
0
Views
549
Activity
Mar ’26
tensorflow-metal ReLU activation fails to clip negative values on M4 Apple Silicon
Environment: Hardware: Mac M4 OS: macOS Sequoia 15.7.4 TensorFlow-macOS Version: 2.16.2 TensorFlow-metal Version: 1.2.0 Description: When using the tensorflow-metal plug-in for GPU acceleration on M4, the ReLU activation function (both as a layer and as an activation argument) fails to correctly clip negative values to zero. The same code works correctly when forced to run on the CPU. Reproduction Script: import os import numpy as np import tensorflow as tf # weights and biases = -1 weights = [np.ones((10, 5)) * -1, np.ones(5) * -1] # input = 1 data = np.ones((1, 10)) # comment this line => GPU => get negative values # uncomment this line => CPU => no negative values # tf.config.set_visible_devices([], 'GPU') # create model model = tf.keras.Sequential([ tf.keras.layers.Input(shape=(10,)), tf.keras.layers.Dense(5, activation='relu') ]) # set weights model.layers[0].set_weights(weights) # get output output = model.predict(data) # check if negative is present print(f"min value: {output.min()}") print(f"is negative present? {np.any(output < 0)}")
Replies
2
Boosts
0
Views
498
Activity
Mar ’26
Error Domain=NSOSStatusErrorDomain Code=-1 "kCFStreamErrorHTTPParseFailure / kCFSocketError / kCFStreamErrorDomainCustom / kCSIdentityUnknownAuthorityErr / qErr / telGenericError / dsNoExtsMacsBug / kMovieLoadStateError / cdevGenErr: Could not parse
Can't able to run the Create ML for training and I upgraded to MacOS 26.3 beta and I have tried older and newer
Replies
0
Boosts
0
Views
292
Activity
Mar ’26
Apple Intelligence Naughty Naughty
When doing some exploratory research into using Apple Intelligence in our aviation-focused application, I noticed that there were several times that key phases would be marked as inappropriate. I tried to stifle these using prompts and rules but couldn't get it to take hold. I was encouraged by an Apple employee to go ahead and post this so that the AI team can use the feedback. There were several terms that triggered this warning, but the two that were most prominent were: 'Tailwind' 'JFK' or 'KJFK' (NY airport ICAO/IATA codes)
Replies
2
Boosts
0
Views
696
Activity
Mar ’26
Handling exceedingContextWindowSizeError
Reading all the docs(1) I was under the impression that handling this error is well managed... Until I hit it and found out that the recommended handling options hide a crucial fact: in the catch block you can not do anything?! It's too late - everything is lost, no way to recover... All the docs mislead me that I can apply the Transcript trick in the catch block until I realised, that there is nothing there !!! This article here(2) enlightened me on the handling of this problem, but I must say (and the author as well) - this is a hack! So my questions: is there really no way to handle this exception properly? if not, can we have the most important information - the count of the context exposed through the official API (at least the known ones)? https://developer.apple.com/documentation/Technotes/tn3193-managing-the-on-device-foundation-model-s-context-window#Handle-the-exceeding-context-window-size-error-elegantly https://zats.io/blog/making-the-most-of-apple-foundation-models-context-window/
Replies
1
Boosts
0
Views
259
Activity
Mar ’26