Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

Best practices for designing proactive FinTech insights with App Intents & Shortcuts?
Hello fellow developers, I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot. We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full." My question for the community is about the architectural and user experience best practices for this. How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user? What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance? Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations? We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge. Thanks for your insights!
0
0
370
Oct ’25
Updated DetectHandPoseRequest revision from WWDC25 doesn't exist
I watched this year WWDC25 "Read Documents using the Vision framework". At the end of video there is mention of new DetectHandPoseRequest model for hand pose detection in Vision API. I looked Apple documentation and I don't see new revision. Moreover probably typo in video because there is only DetectHumanPoseRequst (swift based) and VNDetectHumanHandPoseRequest (obj-c based) (notice lack of Human prefix in WWDC video) First one have revision only added in iOS 18+: https://developer.apple.com/documentation/vision/detecthumanhandposerequest/revision-swift.enum/revision1 Second one have revision only added in iOS14+: https://developer.apple.com/documentation/vision/vndetecthumanhandposerequestrevision1 I don't see any new revision targeting iOS26+
0
0
169
Oct ’25
VNDetectTextRectanglesRequest not detecting text rectangles (includes image)
Hi everyone, I'm trying to use VNDetectTextRectanglesRequest to detect text rectangles in an image. Here's my current code: guard let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else { return } let textDetectionRequest = VNDetectTextRectanglesRequest { request, error in if let error = error { print("Text detection error: \(error)") return } guard let observations = request.results as? [VNTextObservation] else { print("No text rectangles detected.") return } print("Detected \(observations.count) text rectangles.") for observation in observations { print(observation.boundingBox) } } textDetectionRequest.revision = VNDetectTextRectanglesRequestRevision1 textDetectionRequest.reportCharacterBoxes = true let handler = VNImageRequestHandler(cgImage: cgImage, orientation: .up, options: [:]) do { try handler.perform([textDetectionRequest]) } catch { print("Vision request error: \(error)") } The request completes without error, but no text rectangles are detected — the observations array is empty (count = 0). Here's a sample image I'm testing with: I expected VNTextObservation results, but I'm not getting any. Is there something I'm missing in how this API works? Or could it be a limitation of this request or revision? Thanks for any help!
2
0
172
May ’25
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
0
0
494
Mar ’26
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
0
0
477
Mar ’26
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
0
0
355
Mar ’26
Provide spoken voice search string
Hello, My goal is to enable users to perform a freeform search request for any product I sell using a spoken phrase, for example, "Hey Siri, search GAMING CONSOLES on MyCatalogApp". The result would launch MyCatalogApp and navigate to a search results page displaying gaming consoles. I have defined a SearchIntent (using the .system.search schema) and a Shortcut to accomplish this. However, Siri doesn't seem to be able to correctly parse the spoken phrase, extract the search string, and provide it as the critiria term within SearchIntent. What am I doing wrong? Here is the SearchIntent. Note the print() statement outputs the search string--which in the scenario above would be "GAMING CONSOLES"--but it doesn't work. import AppIntents @available(iOS 17.2, *) @AppIntent(schema: .system.search) struct SearchIntent: ShowInAppSearchResultsIntent { static var searchScopes: [StringSearchScope] = [.general] @Parameter(title: "Criteria") var criteria: StringSearchCriteria static var title: LocalizedStringResource = "Search with MyCatalogApp" @MainActor func perform() async throws -> some IntentResult { let searchString = criteria.term print("**** Search String: \(searchString) ****") // tmp debugging try await MyCatalogSearchHelper.search(for: searchString) // fetch results via my async fetch API return .result() } } Here's the Shortcuts definition: import AppIntents @available(iOS 17.2, *) struct Shortcuts: AppShortcutsProvider { @AppShortcutsBuilder static var appShortcuts: [AppShortcut] { AppShortcut( intent: SearchIntent(), phrases: ["Search for \(\.$criteria) on \(.applicationName)."], shortTitle: "Search", systemImageName: "magnifyingglass" ) } } Thanks for any help!
1
0
773
Mar ’26
Apple OCR framework seems to be holding on to allocations every time it is called.
Environment: macOS 26.2 (Tahoe) Xcode 16.3 Apple Silicon (M4) Sandboxed Mac App Store app Description: Repeated use of VNRecognizeTextRequest causes permanent memory growth in the host process. The physical footprint increases by approximately 3-15 MB per OCR call and never returns to baseline, even after all references to the request, handler, observations, and image are released. ` private func selectAndProcessImage() { let panel = NSOpenPanel() panel.allowedContentTypes = [.image] panel.allowsMultipleSelection = false panel.canChooseDirectories = false panel.message = "Select an image for OCR processing" guard panel.runModal() == .OK, let url = panel.url else { return } selectedImageURL = url isProcessing = true recognizedText = "Processing..." // Run OCR on a background thread to keep UI responsive let workItem = DispatchWorkItem { let result = performOCR(on: url) DispatchQueue.main.async { recognizedText = result isProcessing = false } } DispatchQueue.global(qos: .userInitiated).async(execute: workItem) } private func performOCR(on url: URL) -> String { // Wrap EVERYTHING in autoreleasepool so all ObjC objects are drained immediately let resultText: String = autoreleasepool { // Load image and convert to CVPixelBuffer for explicit memory control guard let imageData = try? Data(contentsOf: url) else { return "Error: Could not read image file." } guard let nsImage = NSImage(data: imageData) else { return "Error: Could not create image from file data." } guard let cgImage = nsImage.cgImage(forProposedRect: nil, context: nil, hints: nil) else { return "Error: Could not create CGImage." } let width = cgImage.width let height = cgImage.height // Create a CVPixelBuffer from the CGImage var pixelBuffer: CVPixelBuffer? let attrs: [String: Any] = [ kCVPixelBufferCGImageCompatibilityKey as String: true, kCVPixelBufferCGBitmapContextCompatibilityKey as String: true ] let status = CVPixelBufferCreate( kCFAllocatorDefault, width, height, kCVPixelFormatType_32ARGB, attrs as CFDictionary, &pixelBuffer ) guard status == kCVReturnSuccess, let buffer = pixelBuffer else { return "Error: Could not create CVPixelBuffer (status: \(status))." } // Draw the CGImage into the pixel buffer CVPixelBufferLockBaseAddress(buffer, []) guard let context = CGContext( data: CVPixelBufferGetBaseAddress(buffer), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(buffer), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue ) else { CVPixelBufferUnlockBaseAddress(buffer, []) return "Error: Could not create CGContext for pixel buffer." } context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height)) CVPixelBufferUnlockBaseAddress(buffer, []) // Run OCR let requestHandler = VNImageRequestHandler(cvPixelBuffer: buffer, options: [:]) let request = VNRecognizeTextRequest() request.recognitionLevel = .accurate request.usesLanguageCorrection = true do { try requestHandler.perform([request]) } catch { return "Error during OCR: \(error.localizedDescription)" } guard let observations = request.results, !observations.isEmpty else { return "No text found in image." } let lines = observations.compactMap { observation in observation.topCandidates(1).first?.string } // Explicitly nil out the pixel buffer before the pool drains pixelBuffer = nil return lines.joined(separator: "\n") } // Everything — Data, NSImage, CGImage, CVPixelBuffer, VN objects — released here return resultText } `
0
0
189
Feb ’26
App stuck “In Review” for several days after AI-policy rejection — need clarification
Hello everyone, I’m looking for guidance regarding my app review timeline, as things seem unusually delayed compared to previous submissions. My iOS app was rejected on November 19th due to AI-related policy questions. I immediately responded to the reviewer with detailed explanations covering: Model used (Gemini Flash 2.0 / 2.5 Lite) How the AI only generates neutral, non-directive reflective questions How the system prevents any diagnosis, therapy-like behavior or recommendations Crisis-handling limitations Safety safeguards at generation and UI level Internal red-team testing and results Data retention, privacy, and non-use of data for model training After sending the requested information, I resubmitted the build on November 19th at 14:40. Since then: November 20th (7:30) → Status changed to In Review. November 21st, 22nd, 23rd, 24th, 25th → No movement, still In Review. My open case on App Store Connect is still pending without updates. Because of the previous rejection, I expected a short delay, but this is now 5 days total and 3 business days with no progress, which feels longer than usual for my past submissions. I’m not sure whether: My app is in a secondary review queue due to the AI-related rejection, The reviewer is waiting for internal clarification, Or if something is stuck and needs to be escalated. I don’t want to resubmit a new build unless necessary, since that would restart the queue. Could someone from the community (or Apple, if possible) confirm whether this waiting time is normal after an AI-policy rejection? And is there anything I should do besides waiting — for example, contacting Developer Support again or requesting a follow-up? Thank you very much for your help. I appreciate any insight from others who have experienced similar delays.
0
0
756
Nov ’25
VNDetectFaceRectanglesRequest does not use the Neural Engine?
I'm on Tahoe 26.1 / M3 Macbook Air. I'm using VNDetectFaceRectanglesRequest as properly as possible, as in the minimal command line program attached below. For some reason, I always get: MLE5Engine is disabled through the configuration printed. I couldn't find any notes on developer docs saying that VNDetectFaceRectanglesRequest can not use the Apple Neural Engine. I'm assuming there is something wrong with my code however I wasn't able to find any remarks from documentation where it might be. I wasn't able to find the above error message online either. I would appreciate your help a lot and thank you in advance. The code below accesses the video from AVCaptureDevice.DeviceType.builtInWideAngleCamera. Currently it directly chooses the 0th format which has the largest resolution (Full HD on my M3 MBA) and "4:2:0" color "v" reduced color component spectrum encoding ("420v"). After accessing video, it performs a VNDetectFaceRectanglesRequest. It prints "VNDetectFaceRectanglesRequest completion Handler called" many times, then prints the error message above, then continues printing "VNDetectFaceRectanglesRequest completion Handler called" until the user quits it. To run it in Xcode, File > New project > Mac command line tool. Pasting the code below, then click on the root file > Targets > Signing & Capabilities > Hardened Runtime > Resource Access > Camera. A possible explanation could be that either Apple's internal CoreML code for this function works on GPU/CPU only or it doesn't accept 420v as supplied by the Macbook Air camera import AVKit import Vision var videoDataOutput: AVCaptureVideoDataOutput = AVCaptureVideoDataOutput() var detectionRequests: [VNDetectFaceRectanglesRequest]? var videoDataOutputQueue: DispatchQueue = DispatchQueue(label: "queue") class XYZ: /*NSViewController or NSObject*/NSObject, AVCaptureVideoDataOutputSampleBufferDelegate { func viewDidLoad() { //super.viewDidLoad() let session = AVCaptureSession() let inputDevice = try! self.configureFrontCamera(for: session) self.configureVideoDataOutput(for: inputDevice.device, resolution: inputDevice.resolution, captureSession: session) self.prepareVisionRequest() session.startRunning() } fileprivate func highestResolution420Format(for device: AVCaptureDevice) -> (format: AVCaptureDevice.Format, resolution: CGSize)? { let deviceFormat = device.formats[0] print(deviceFormat) let dims = CMVideoFormatDescriptionGetDimensions(deviceFormat.formatDescription) let resolution = CGSize(width: CGFloat(dims.width), height: CGFloat(dims.height)) return (deviceFormat, resolution) } fileprivate func configureFrontCamera(for captureSession: AVCaptureSession) throws -> (device: AVCaptureDevice, resolution: CGSize) { let deviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [AVCaptureDevice.DeviceType.builtInWideAngleCamera], mediaType: .video, position: AVCaptureDevice.Position.unspecified) let device = deviceDiscoverySession.devices.first! let deviceInput = try! AVCaptureDeviceInput(device: device) captureSession.addInput(deviceInput) let highestResolution = self.highestResolution420Format(for: device)! try! device.lockForConfiguration() device.activeFormat = highestResolution.format device.unlockForConfiguration() return (device, highestResolution.resolution) } fileprivate func configureVideoDataOutput(for inputDevice: AVCaptureDevice, resolution: CGSize, captureSession: AVCaptureSession) { videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue) captureSession.addOutput(videoDataOutput) } fileprivate func prepareVisionRequest() { let faceDetectionRequest: VNDetectFaceRectanglesRequest = VNDetectFaceRectanglesRequest(completionHandler: { (request, error) in print("VNDetectFaceRectanglesRequest completion Handler called") }) // Start with detection detectionRequests = [faceDetectionRequest] } // MARK: AVCaptureVideoDataOutputSampleBufferDelegate // Handle delegate method callback on receiving a sample buffer. public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { var requestHandlerOptions: [VNImageOption: AnyObject] = [:] let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) if cameraIntrinsicData != nil { requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData } let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)! // No tracking object detected, so perform initial detection let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.up, options: requestHandlerOptions) try! imageRequestHandler.perform(detectionRequests!) } } let X = XYZ() X.viewDidLoad() sleep(9999999)
0
0
535
Nov ’25
Inquiry Regarding Siri–AI Integration Capabilities
: Hello, I’m seeking clarification on whether Apple provides any framework or API that enables deep integration between Siri and advanced AI assistants (such as ChatGPT), including system-level functions like voice interaction, navigation, cross-platform syncing, and operational access similar to Siri’s own capabilities. If no such option exists today, I would appreciate guidance on the recommended path or approved third-party solutions for building a unified, voice-first experience across Apple’s ecosystem. Thank you for your time and insight.
0
0
173
Nov ’25
Is Jax for Apple Silicon is still supported
Hi From https://developer.apple.com/metal/jax/ I checked all active workflows on https://github.com/jax-ml/jax and any open issues with tags Metal and seems in DEC 2025 the Jax maintainers have closed all issues citing No active development on Jax-metal and the project seems dead. We need to know how can we leverage Apple silicon for accelerated projects using popular academia library and tools . Is the JAX project still going to be supported or Apple has plans to bring something of tis own that might be platform agnostic . Thanks
0
0
213
Feb ’26
Assert error breaking previews
A foundation models bug I keep running into when in the preview phase of the testing. The error never seems to occur or break the app when I am testing on the simulator or on a device but sometimes I am running into this error when in a longer session while being in preview. The error breaks the preview and crashes it and the waring on it is labeled as : "Assert in LanguageModelFeedback.swift" This is something I keep running into, where I have been using foundation models for my project
2
0
445
Feb ’26
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
0
0
508
Dec ’25
LanguageModelSession always returns very lengthy responses
No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
3
0
390
Sep ’25
Best practices for designing proactive FinTech insights with App Intents & Shortcuts?
Hello fellow developers, I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot. We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full." My question for the community is about the architectural and user experience best practices for this. How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user? What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance? Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations? We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge. Thanks for your insights!
Replies
0
Boosts
0
Views
370
Activity
Oct ’25
How Is useful AI
I want to introduce how is usefully AI
Replies
0
Boosts
0
Views
115
Activity
3w
Updated DetectHandPoseRequest revision from WWDC25 doesn't exist
I watched this year WWDC25 "Read Documents using the Vision framework". At the end of video there is mention of new DetectHandPoseRequest model for hand pose detection in Vision API. I looked Apple documentation and I don't see new revision. Moreover probably typo in video because there is only DetectHumanPoseRequst (swift based) and VNDetectHumanHandPoseRequest (obj-c based) (notice lack of Human prefix in WWDC video) First one have revision only added in iOS 18+: https://developer.apple.com/documentation/vision/detecthumanhandposerequest/revision-swift.enum/revision1 Second one have revision only added in iOS14+: https://developer.apple.com/documentation/vision/vndetecthumanhandposerequestrevision1 I don't see any new revision targeting iOS26+
Replies
0
Boosts
0
Views
169
Activity
Oct ’25
VNDetectTextRectanglesRequest not detecting text rectangles (includes image)
Hi everyone, I'm trying to use VNDetectTextRectanglesRequest to detect text rectangles in an image. Here's my current code: guard let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else { return } let textDetectionRequest = VNDetectTextRectanglesRequest { request, error in if let error = error { print("Text detection error: \(error)") return } guard let observations = request.results as? [VNTextObservation] else { print("No text rectangles detected.") return } print("Detected \(observations.count) text rectangles.") for observation in observations { print(observation.boundingBox) } } textDetectionRequest.revision = VNDetectTextRectanglesRequestRevision1 textDetectionRequest.reportCharacterBoxes = true let handler = VNImageRequestHandler(cgImage: cgImage, orientation: .up, options: [:]) do { try handler.perform([textDetectionRequest]) } catch { print("Vision request error: \(error)") } The request completes without error, but no text rectangles are detected — the observations array is empty (count = 0). Here's a sample image I'm testing with: I expected VNTextObservation results, but I'm not getting any. Is there something I'm missing in how this API works? Or could it be a limitation of this request or revision? Thanks for any help!
Replies
2
Boosts
0
Views
172
Activity
May ’25
FoundationModels tool calling doesn't get triggered
In the play ground I'm trying to bias my LanguageModel to use a tool I registered, but I don't see it actually calling the tool. I'm following the developer video on landmarks itinerary generation tutorial almost verbatim. Is this a prompt engineering thing I'm missing? Or is it possible that I'm injecting my tool wrong?
Replies
1
Boosts
0
Views
301
Activity
Jul ’25
Building Real-Time Voice Input on macOS 26 with SpeechAnalyzer + ScreenCaptureKit
We built an open-source macOS menu bar app that turns speech into text and pastes it into the active app — using SpeechAnalyzer for on-device transcription, ScreenCaptureKit + Vision for screen-aware context, and FluidAudio for speaker diarization in meeting mode. Here's what we learned shipping it on macOS 26. GitHub: github.com/Marvinngg/ambient-voice Architecture The app has two modes: hotkey dictation (press to talk, release to inject) and meeting recording (continuous transcription with a floating panel). Dictation Mode Audio capture uses AVCaptureSession (more on why below). The captured audio feeds into SpeechAnalyzer via an AsyncStream: let transcriber = SpeechTranscriber( locale: locale, transcriptionOptions: [], reportingOptions: [.volatileResults, .alternativeTranscriptions], attributeOptions: [.audioTimeRange, .transcriptionConfidence] ) let analyzer = SpeechAnalyzer(modules: [transcriber]) let (inputSequence, inputBuilder) = AsyncStream.makeStream() try await analyzer.start(inputSequence: inputSequence) While recording, we capture a screenshot of the focused window using ScreenCaptureKit, run Vision OCR (VNRecognizeTextRequest), extract keywords, and inject them into SpeechAnalyzer as contextual bias: let context = AnalysisContext() context.contextualStrings[.general] = ocrKeywords try await analyzer.setContext(context) This improves accuracy for technical terms and proper nouns visible on screen. If your screen shows "SpeechAnalyzer", saying it out loud is more likely to be transcribed correctly. After transcription, an optional L2 step sends the text through a local LLM (ollama) for spoken-to-written cleanup, then CGEvent simulates Cmd+V to paste into the active app. Meeting Mode Meeting mode forks the same audio stream to two consumers: SpeechAnalyzer — real-time streaming transcription, displayed in a floating NSPanel FluidAudio buffer — accumulates 16kHz Float32 mono samples for batch speaker diarization after recording stops When the user ends the meeting, FluidAudio's performCompleteDiarization() runs on the accumulated audio. We align transcription segments with speaker segments using audioTimeRange overlap matching — each transcription segment gets assigned the speaker ID with the most time overlap. Results export to Markdown. Pitfalls We Hit on macOS 26 1. AVAudioEngine installTap doesn't fire with Bluetooth devices We started with AVAudioEngine.inputNode.installTap() for audio capture. It worked fine with built-in mics but the tap callback never fired with Bluetooth devices (tested with vivo TWS 4 Hi-Fi). Fix: switched to AVCaptureSession. The delegate callback captureOutput(_:didOutput:from:) fires reliably regardless of audio device. The tradeoff is you get CMSampleBuffer instead of AVAudioPCMBuffer, so you need a conversion step. 2. NSEvent addGlobalMonitorForEvents crashes Our global hotkey listener used NSEvent.addGlobalMonitorForEvents. On macOS 26, this crashes with a Bus error inside GlobalObserverHandler — appears to be a Swift actor runtime issue. Fix: switched to CGEventTap. Works reliably, but the callback runs on a CFRunLoop context, which Swift doesn't recognize as MainActor. 3. CGEventTap callbacks aren't on MainActor If your CGEventTap callback touches any @MainActor state, you'll get concurrency violations. The callback runs on whatever thread owns the CFRunLoop. Fix: bridge with DispatchQueue.main.async {} inside the tap callback before touching any MainActor state. 4. CGPreflightScreenCaptureAccess doesn't request permission We used CGPreflightScreenCaptureAccess() as a guard before calling ScreenCaptureKit. If it returned false, we'd bail out. The problem: this function only checks — it never triggers macOS to add your app to the Screen Recording permission list. Chicken-and-egg: you can't get permission because you never ask for it. Fix: call CGRequestScreenCaptureAccess() at app startup. This adds your app to System Settings → Screen Recording. Then let ScreenCaptureKit calls proceed without the preflight guard — SCShareableContent will also trigger the permission prompt on first use. 5. Ad-hoc signing breaks TCC permissions on every rebuild During development, codesign --sign - (ad-hoc) generates a different code directory hash on every build. macOS TCC tracks permissions by this hash, so every rebuild = new app identity = all permissions reset. Fix: sign with a stable certificate. If you have an Apple Development certificate, use that. The TeamIdentifier stays constant across rebuilds, so TCC permissions persist. We also discovered that launching via open WE.app (LaunchServices) instead of directly executing the binary is required — otherwise macOS attributes TCC permissions to Terminal, not your app. Benchmarks We ran end-to-end benchmarks on public datasets (Mac Mini M4 16GB, macOS 26): Transcription (SpeechAnalyzer, AliMeeting Chinese): • Near-field CER 34% (excluding outliers ~25%) • Far-field CER 40% (single channel, no beamforming, >30% overlap) • Processing speed 74-89x real-time Speaker diarization (FluidAudio offline): • AMI English 16 meetings: avg DER 23.2% (collar=0.25s, ignoreOverlap=True) • AliMeeting Chinese 8 meetings: DER 48.5% (including overlap regions) • Memory: RSS ~500MB, peak 730-930MB Full evaluation methodology, scripts, and raw results are in the repo. Open Source The project is MIT licensed: github.com/Marvinngg/ambient-voice It includes the macOS client (Swift 6.2, SPM), server-side distillation/training scripts (Python), and a complete evaluation framework with reproducible benchmarks. Feedback and contributions welcome.
Replies
0
Boosts
0
Views
494
Activity
Mar ’26
reinforcement learning from Apple?
I don't know if these forums are any good for rumors or plans, but does anybody know whether or not Apple plans to release a library for training reinforcement learning? It would be handy, implementing games in Swift, for example, to be able to train the computer players on the same code.
Replies
0
Boosts
0
Views
477
Activity
Mar ’26
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3
Subject: Technical Report: Float32 Precision Ceiling & Memory Fragmentation in JAX/Metal Workloads on M3 To: Metal Developer Relations Hello, I am reporting a repeatable numerical saturation point encountered during sustained recursive high-order differential workloads on the Apple M3 (16 GB unified memory) using the JAX Metal backend. Workload Characteristics: Large-scale vector projections across multi-dimensional industrial datasets Repeated high-order finite-difference calculations Heavy use of jax.grad and lax.cond inside long-running loops Observation: Under these conditions, the Metal/MPS backend consistently enters a terminal quantization lock where outputs saturate at a fixed scalar value (2.0000), followed by system-wide NaN propagation. This appears to be a precision-limited boundary in the JAX-Metal bridge when handling high-order operations with cubic time-scale denominators. have identified the specific threshold where recursive high-order tensor derivatives exceed the numerical resolution of 32-bit consumer architectures, necessitating a migration to a dedicated 64-bit industrial stack. I have prepared a minimal synthetic test script (randomized vectors only, no proprietary logic) that reliably reproduces the allocator fragmentation and saturation behavior. Let me know if your team would like the telemetry for XLA/MPS optimization purposes. Best regards, Alex Severson Architect, QuantumPulse AI
Replies
0
Boosts
0
Views
355
Activity
Mar ’26
Any Recommandation for a Image Enhance and Denoise Model
I'm really not familiar with ML, but I need a model that can enhance and denoise 4k video stream at 30fps. I have tried to search latest papers but they all have very complex structure, and I don't think I can convert them to mlmodel. So can anyone give me any recommandation for such models? If there is an existing mlmodel, that would be great!
Replies
0
Boosts
0
Views
294
Activity
Oct ’25
Provide spoken voice search string
Hello, My goal is to enable users to perform a freeform search request for any product I sell using a spoken phrase, for example, "Hey Siri, search GAMING CONSOLES on MyCatalogApp". The result would launch MyCatalogApp and navigate to a search results page displaying gaming consoles. I have defined a SearchIntent (using the .system.search schema) and a Shortcut to accomplish this. However, Siri doesn't seem to be able to correctly parse the spoken phrase, extract the search string, and provide it as the critiria term within SearchIntent. What am I doing wrong? Here is the SearchIntent. Note the print() statement outputs the search string--which in the scenario above would be "GAMING CONSOLES"--but it doesn't work. import AppIntents @available(iOS 17.2, *) @AppIntent(schema: .system.search) struct SearchIntent: ShowInAppSearchResultsIntent { static var searchScopes: [StringSearchScope] = [.general] @Parameter(title: "Criteria") var criteria: StringSearchCriteria static var title: LocalizedStringResource = "Search with MyCatalogApp" @MainActor func perform() async throws -> some IntentResult { let searchString = criteria.term print("**** Search String: \(searchString) ****") // tmp debugging try await MyCatalogSearchHelper.search(for: searchString) // fetch results via my async fetch API return .result() } } Here's the Shortcuts definition: import AppIntents @available(iOS 17.2, *) struct Shortcuts: AppShortcutsProvider { @AppShortcutsBuilder static var appShortcuts: [AppShortcut] { AppShortcut( intent: SearchIntent(), phrases: ["Search for \(\.$criteria) on \(.applicationName)."], shortTitle: "Search", systemImageName: "magnifyingglass" ) } } Thanks for any help!
Replies
1
Boosts
0
Views
773
Activity
Mar ’26
Apple OCR framework seems to be holding on to allocations every time it is called.
Environment: macOS 26.2 (Tahoe) Xcode 16.3 Apple Silicon (M4) Sandboxed Mac App Store app Description: Repeated use of VNRecognizeTextRequest causes permanent memory growth in the host process. The physical footprint increases by approximately 3-15 MB per OCR call and never returns to baseline, even after all references to the request, handler, observations, and image are released. ` private func selectAndProcessImage() { let panel = NSOpenPanel() panel.allowedContentTypes = [.image] panel.allowsMultipleSelection = false panel.canChooseDirectories = false panel.message = "Select an image for OCR processing" guard panel.runModal() == .OK, let url = panel.url else { return } selectedImageURL = url isProcessing = true recognizedText = "Processing..." // Run OCR on a background thread to keep UI responsive let workItem = DispatchWorkItem { let result = performOCR(on: url) DispatchQueue.main.async { recognizedText = result isProcessing = false } } DispatchQueue.global(qos: .userInitiated).async(execute: workItem) } private func performOCR(on url: URL) -> String { // Wrap EVERYTHING in autoreleasepool so all ObjC objects are drained immediately let resultText: String = autoreleasepool { // Load image and convert to CVPixelBuffer for explicit memory control guard let imageData = try? Data(contentsOf: url) else { return "Error: Could not read image file." } guard let nsImage = NSImage(data: imageData) else { return "Error: Could not create image from file data." } guard let cgImage = nsImage.cgImage(forProposedRect: nil, context: nil, hints: nil) else { return "Error: Could not create CGImage." } let width = cgImage.width let height = cgImage.height // Create a CVPixelBuffer from the CGImage var pixelBuffer: CVPixelBuffer? let attrs: [String: Any] = [ kCVPixelBufferCGImageCompatibilityKey as String: true, kCVPixelBufferCGBitmapContextCompatibilityKey as String: true ] let status = CVPixelBufferCreate( kCFAllocatorDefault, width, height, kCVPixelFormatType_32ARGB, attrs as CFDictionary, &pixelBuffer ) guard status == kCVReturnSuccess, let buffer = pixelBuffer else { return "Error: Could not create CVPixelBuffer (status: \(status))." } // Draw the CGImage into the pixel buffer CVPixelBufferLockBaseAddress(buffer, []) guard let context = CGContext( data: CVPixelBufferGetBaseAddress(buffer), width: width, height: height, bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(buffer), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue ) else { CVPixelBufferUnlockBaseAddress(buffer, []) return "Error: Could not create CGContext for pixel buffer." } context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height)) CVPixelBufferUnlockBaseAddress(buffer, []) // Run OCR let requestHandler = VNImageRequestHandler(cvPixelBuffer: buffer, options: [:]) let request = VNRecognizeTextRequest() request.recognitionLevel = .accurate request.usesLanguageCorrection = true do { try requestHandler.perform([request]) } catch { return "Error during OCR: \(error.localizedDescription)" } guard let observations = request.results, !observations.isEmpty else { return "No text found in image." } let lines = observations.compactMap { observation in observation.topCandidates(1).first?.string } // Explicitly nil out the pixel buffer before the pool drains pixelBuffer = nil return lines.joined(separator: "\n") } // Everything — Data, NSImage, CGImage, CVPixelBuffer, VN objects — released here return resultText } `
Replies
0
Boosts
0
Views
189
Activity
Feb ’26
App stuck “In Review” for several days after AI-policy rejection — need clarification
Hello everyone, I’m looking for guidance regarding my app review timeline, as things seem unusually delayed compared to previous submissions. My iOS app was rejected on November 19th due to AI-related policy questions. I immediately responded to the reviewer with detailed explanations covering: Model used (Gemini Flash 2.0 / 2.5 Lite) How the AI only generates neutral, non-directive reflective questions How the system prevents any diagnosis, therapy-like behavior or recommendations Crisis-handling limitations Safety safeguards at generation and UI level Internal red-team testing and results Data retention, privacy, and non-use of data for model training After sending the requested information, I resubmitted the build on November 19th at 14:40. Since then: November 20th (7:30) → Status changed to In Review. November 21st, 22nd, 23rd, 24th, 25th → No movement, still In Review. My open case on App Store Connect is still pending without updates. Because of the previous rejection, I expected a short delay, but this is now 5 days total and 3 business days with no progress, which feels longer than usual for my past submissions. I’m not sure whether: My app is in a secondary review queue due to the AI-related rejection, The reviewer is waiting for internal clarification, Or if something is stuck and needs to be escalated. I don’t want to resubmit a new build unless necessary, since that would restart the queue. Could someone from the community (or Apple, if possible) confirm whether this waiting time is normal after an AI-policy rejection? And is there anything I should do besides waiting — for example, contacting Developer Support again or requesting a follow-up? Thank you very much for your help. I appreciate any insight from others who have experienced similar delays.
Replies
0
Boosts
0
Views
756
Activity
Nov ’25
VNDetectFaceRectanglesRequest does not use the Neural Engine?
I'm on Tahoe 26.1 / M3 Macbook Air. I'm using VNDetectFaceRectanglesRequest as properly as possible, as in the minimal command line program attached below. For some reason, I always get: MLE5Engine is disabled through the configuration printed. I couldn't find any notes on developer docs saying that VNDetectFaceRectanglesRequest can not use the Apple Neural Engine. I'm assuming there is something wrong with my code however I wasn't able to find any remarks from documentation where it might be. I wasn't able to find the above error message online either. I would appreciate your help a lot and thank you in advance. The code below accesses the video from AVCaptureDevice.DeviceType.builtInWideAngleCamera. Currently it directly chooses the 0th format which has the largest resolution (Full HD on my M3 MBA) and "4:2:0" color "v" reduced color component spectrum encoding ("420v"). After accessing video, it performs a VNDetectFaceRectanglesRequest. It prints "VNDetectFaceRectanglesRequest completion Handler called" many times, then prints the error message above, then continues printing "VNDetectFaceRectanglesRequest completion Handler called" until the user quits it. To run it in Xcode, File > New project > Mac command line tool. Pasting the code below, then click on the root file > Targets > Signing & Capabilities > Hardened Runtime > Resource Access > Camera. A possible explanation could be that either Apple's internal CoreML code for this function works on GPU/CPU only or it doesn't accept 420v as supplied by the Macbook Air camera import AVKit import Vision var videoDataOutput: AVCaptureVideoDataOutput = AVCaptureVideoDataOutput() var detectionRequests: [VNDetectFaceRectanglesRequest]? var videoDataOutputQueue: DispatchQueue = DispatchQueue(label: "queue") class XYZ: /*NSViewController or NSObject*/NSObject, AVCaptureVideoDataOutputSampleBufferDelegate { func viewDidLoad() { //super.viewDidLoad() let session = AVCaptureSession() let inputDevice = try! self.configureFrontCamera(for: session) self.configureVideoDataOutput(for: inputDevice.device, resolution: inputDevice.resolution, captureSession: session) self.prepareVisionRequest() session.startRunning() } fileprivate func highestResolution420Format(for device: AVCaptureDevice) -> (format: AVCaptureDevice.Format, resolution: CGSize)? { let deviceFormat = device.formats[0] print(deviceFormat) let dims = CMVideoFormatDescriptionGetDimensions(deviceFormat.formatDescription) let resolution = CGSize(width: CGFloat(dims.width), height: CGFloat(dims.height)) return (deviceFormat, resolution) } fileprivate func configureFrontCamera(for captureSession: AVCaptureSession) throws -> (device: AVCaptureDevice, resolution: CGSize) { let deviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [AVCaptureDevice.DeviceType.builtInWideAngleCamera], mediaType: .video, position: AVCaptureDevice.Position.unspecified) let device = deviceDiscoverySession.devices.first! let deviceInput = try! AVCaptureDeviceInput(device: device) captureSession.addInput(deviceInput) let highestResolution = self.highestResolution420Format(for: device)! try! device.lockForConfiguration() device.activeFormat = highestResolution.format device.unlockForConfiguration() return (device, highestResolution.resolution) } fileprivate func configureVideoDataOutput(for inputDevice: AVCaptureDevice, resolution: CGSize, captureSession: AVCaptureSession) { videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue) captureSession.addOutput(videoDataOutput) } fileprivate func prepareVisionRequest() { let faceDetectionRequest: VNDetectFaceRectanglesRequest = VNDetectFaceRectanglesRequest(completionHandler: { (request, error) in print("VNDetectFaceRectanglesRequest completion Handler called") }) // Start with detection detectionRequests = [faceDetectionRequest] } // MARK: AVCaptureVideoDataOutputSampleBufferDelegate // Handle delegate method callback on receiving a sample buffer. public func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) { var requestHandlerOptions: [VNImageOption: AnyObject] = [:] let cameraIntrinsicData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) if cameraIntrinsicData != nil { requestHandlerOptions[VNImageOption.cameraIntrinsics] = cameraIntrinsicData } let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)! // No tracking object detected, so perform initial detection let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation.up, options: requestHandlerOptions) try! imageRequestHandler.perform(detectionRequests!) } } let X = XYZ() X.viewDidLoad() sleep(9999999)
Replies
0
Boosts
0
Views
535
Activity
Nov ’25
Inquiry Regarding Siri–AI Integration Capabilities
: Hello, I’m seeking clarification on whether Apple provides any framework or API that enables deep integration between Siri and advanced AI assistants (such as ChatGPT), including system-level functions like voice interaction, navigation, cross-platform syncing, and operational access similar to Siri’s own capabilities. If no such option exists today, I would appreciate guidance on the recommended path or approved third-party solutions for building a unified, voice-first experience across Apple’s ecosystem. Thank you for your time and insight.
Replies
0
Boosts
0
Views
173
Activity
Nov ’25
Is Jax for Apple Silicon is still supported
Hi From https://developer.apple.com/metal/jax/ I checked all active workflows on https://github.com/jax-ml/jax and any open issues with tags Metal and seems in DEC 2025 the Jax maintainers have closed all issues citing No active development on Jax-metal and the project seems dead. We need to know how can we leverage Apple silicon for accelerated projects using popular academia library and tools . Is the JAX project still going to be supported or Apple has plans to bring something of tis own that might be platform agnostic . Thanks
Replies
0
Boosts
0
Views
213
Activity
Feb ’26
Assert error breaking previews
A foundation models bug I keep running into when in the preview phase of the testing. The error never seems to occur or break the app when I am testing on the simulator or on a device but sometimes I am running into this error when in a longer session while being in preview. The error breaks the preview and crashes it and the waring on it is labeled as : "Assert in LanguageModelFeedback.swift" This is something I keep running into, where I have been using foundation models for my project
Replies
2
Boosts
0
Views
445
Activity
Feb ’26
ANE Error with Statefu Model: "Unable to compute prediction" when State Tensor width is not 32-aligned
Hi everyone, I believe I’ve encountered a potential bug or a hardware alignment limitation in the Core ML Framework / ANE Runtime specifically affecting the new Stateful API (introduced in iOS 18/macOS 15). The Issue: A Stateful mlprogram fails to run on the Apple Neural Engine (ANE) if the state tensor dimensions (specifically the width) are not a multiple of 32. The model works perfectly on CPU and GPU, but fails on ANE both during runtime and when generating a Performance Report in Xcode. Error Message in Xcode UI: "There was an error creating the performance report Unable to compute the prediction using ML Program. It can be an invalid input data or broken/unsupported model." Observations: Case A (Fails): State shape = (1, 3, 480, 270). Prediction fails on ANE. Case B (Success): State shape = (1, 3, 480, 256). Prediction succeeds on ANE. This suggests an internal memory alignment or tiling issue within the ANE driver when handling Stateful buffers that don't meet the 32-pixel/element alignment. Reproduction Code (PyTorch + coremltools): import torch.nn as nn import coremltools as ct import numpy as np class RNN_Stateful(nn.Module): def __init__(self, hidden_shape): super(RNN_Stateful, self).__init__() # Simple conv to update state self.conv1 = nn.Conv2d(3 + hidden_shape[1], hidden_shape[1], kernel_size=3, padding=1) self.conv2 = nn.Conv2d(hidden_shape[1], 3, kernel_size=3, padding=1) self.register_buffer("hidden_state", torch.ones(hidden_shape, dtype=torch.float16)) def forward(self, imgs): self.hidden_state = self.conv1(torch.cat((imgs, self.hidden_state), dim=1)) return self.conv2(self.hidden_state) # h=480, w=255 causes ANE failure. w=256 works. b, ch, h, w = 1, 3, 480, 255 model = RNN_Stateful((b, ch, h, w)).eval() traced_model = torch.jit.trace(model, torch.randn(b, 3, h, w)) mlmodel = ct.convert( traced_model, inputs=[ct.TensorType(name="input_image", shape=(b, 3, h, w), dtype=np.float16)], outputs=[ct.TensorType(name="output", dtype=np.float16)], states=[ct.StateType(wrapped_type=ct.TensorType(shape=(b, ch, h, w), dtype=np.float16), name="hidden_state")], minimum_deployment_target=ct.target.iOS18, convert_to="mlprogram" ) mlmodel.save("rnn_stateful.mlpackage") Steps to see the error: Open the generated .mlpackage in Xcode 16.0+. Go to the Performance tab and run a test on a device with ANE (e.g., iPhone 15/16 or M-series Mac). The report will fail to generate with the error mentioned above. Environment: OS: macOS 15.2 Xcode: 16.3 Hardware: M4 Has anyone else encountered this 32-pixel alignment requirement for StateType tensors on ANE? Is this a known hardware constraint or a bug in the Core ML runtime? Any insights or workarounds (other than manual padding) would be appreciated.
Replies
0
Boosts
0
Views
508
Activity
Dec ’25
MLX C++ API for neural networks
It seems to be that Swift has more APIs implemented than the C++ interface (especially APIs found in the MLXNN and MLXOptimize folders). Is there any intention to implement more APIs for neural networks and training them in the future?
Replies
0
Boosts
0
Views
544
Activity
Dec ’25
LanguageModelSession always returns very lengthy responses
No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
Replies
3
Boosts
0
Views
390
Activity
Sep ’25
face and body detection in the Vision framework a local model or a cloud model?
Is the face and body detection service in the Vision framework a local model or a cloud model? Is there a performance report? https://developer.apple.com/documentation/vision
Replies
1
Boosts
0
Views
512
Activity
Sep ’25