Explore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.

Posts under General subtopic

Post

Replies

Boosts

Views

Activity

MPSGraph fused scaledDotProductAttention seems to be buggy
While building an app with large language model inferencing on device, I got gibberish output. After carefully examining every detail, I found it's caused by the fused scaledDotProductAttention operation. I switched back to the discrete operations and problem solved. To reproduce the bug, please check https://github.com/zhoudan111/MPSGraph_SDPA_bug
1
0
515
Mar ’25
Selecting GPU for TensorFlow-Metal on Mac Pro (2013) with v0.8.0
Hi everyone, I'm a Mac enthusiast experimenting with tensorflow-metal on my Mac Pro (2013). My question is about GPU selection in tensorflow-metal (v0.8.0), which still supports Intel-based Macs, including my machine. I've noticed that when running TensorFlow with Metal, it automatically selects a GPU, regardless of what I specify using device indices like "gpu:0", "gpu:1", or "gpu:2". I'm wondering if there's a way to manually specify which GPU should be used via an environment variable or another method. For reference, I’ve tried the example from TensorFlow’s guide on multi-GPU selection: https://www.tensorflow.org/guide/gpu#using_a_single_gpu_on_a_multi-gpu_system My goal is to explore performance optimizations by using MirroredStrategy in TensorFlow to leverage multiple GPUs: https://www.tensorflow.org/guide/distributed_training#mirroredstrategy Interestingly, I discovered that the metalcompute Python library (https://pypi.org/project/metalcompute/) allows to utilize manually selected GPUs on my system, allowing for proper multi-GPU computations. This makes me wonder: Is there a hidden environment variable or setting that allows manual GPU selection in tensorflow-metal? Has anyone successfully used MirroredStrategy on multiple GPUs with tensorflow-metal? Would a bridge between metalcompute and tensorflow-metal be necessary for this use case, or is there a more direct approach? I’d love to hear if anyone else has experimented with this or has insights on getting finer control over GPU selection. Any thoughts or suggestions would be greatly appreciated! Thanks!
3
0
235
Mar ’25
VNRecognizeTextRequest: .automatic vs specific language: different results?
Hi, One can configure the languages of a (VN)RecognizeTextRequest with either: .automatic: language to be detected a specific language, say Spanish If the request is configured with .automatic and successfully detects Spanish, will the results be exactly equivalent compared to a request made with Spanish set as language? I could not find any information about this, and this is very important for the core architecture of my app. Thanks!
2
0
115
Apr ’25
DataScannerViewController does't recognize currency less 1.00
Hi, DataScannerViewController does't recognize currencies less than 1.00 (e.g. 0.59 USD, 0.99 EUR, etc.). Why? How to solve the problem? This feature is not described in Apple documentation, is there a solution? This is my code: func makeUIViewController(context: Context) -> DataScannerViewController { let dataScanner = DataScannerViewController(recognizedDataTypes: [ .text(textContentType: .currency)]) return dataScanner }
4
0
148
Apr ’25
Looking for a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS M1/M2
Hi everyone! 👋 I'm working on a C++ project using TensorFlow Lite and was wondering if anyone has a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS (Apple Silicon M1/M2) that they’d be willing to share. I’m looking specifically for the TensorFlow Lite C++ API — something that lets me use tflite::Interpreter, tflite::FlatBufferModel, etc. Building it from source using Bazel on macOS has been quite challenging and time-consuming, so a ready-to-use .dylib or .a build along with the required headers would be incredibly helpful. TensorFlow Lite version: v2.18.0 preferred Target: macOS arm64 (Apple Silicon) What I need: libtensorflowlite.dylib or .a Corresponding headers (ideally organized in a clean include/ folder) If you have one available or know where I can find a reliable prebuilt version, I’d be super grateful. Thanks in advance! 🙏
2
0
167
Apr ’25
Why doesn't tensorflow-metal use AMD GPU memory?
From tensorflow-metal example: Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: ) I know that Apple silicon uses UMA, and that memory copies are typical of CUDA, but wouldn't the GPU memory still be faster overall? I have an iMac Pro with a Radeon Pro Vega 64 16 GB GPU and an Intel iMac with a Radeon Pro 5700 8 GB GPU. But using tensorflow-metal is still WAY faster than using the CPUs. Thanks for that. I am surprised the 5700 is twice as fast as the Vega though.
1
0
216
Apr ’25
NLTagger.requestAssets hangs indefinitely
When calling NLTagger.requestAssets with some languages, it hangs indefinitely both in the simulator and a device. This happens consistently for some languages like greek. An example call is NLTagger.requestAssets(for: .greek, tagScheme: .lemma). Other languages like french return immediately. I captured some logs from Console and found what looks like the repeated attempts to download the asset. I would expect the call to eventually terminate, either loading the asset or failing with an error.
1
0
163
May ’25
iOS 18 new RecognizedTextRequest DEADLOCKS if more than 2 are run in parallel
Following WWDC24 video "Discover Swift enhancements in the Vision framework" recommendations (cfr video at 10'41"), I used the following code to perform multiple new iOS 18 `RecognizedTextRequest' in parallel. Problem: if more than 2 request are run in parallel, the request will hang, leaving the app in a state where no more requests can be started. -> deadlock I tried other ways to run the requests, but no matter the method employed, or what device I use: no more than 2 requests can ever be run in parallel. func triggerDeadlock() {} try await withThrowingTaskGroup(of: Void.self) { group in // See: WWDC 2024 Discover Siwft enhancements in the Vision framework at 10:41 // ############## THIS IS KEY let maxOCRTasks = 5 // On a real-device, if more than 2 RecognizeTextRequest are launched in parallel using tasks, the request hangs // ############## THIS IS KEY for idx in 0..<maxOCRTasks { let url = ... // URL to some image group.addTask { // Perform OCR let _ = await performOCRRequest(on: url: url) } } var nextIndex = maxOCRTasks for try await _ in group { // Wait for the result of the next child task that finished if nextIndex < pageCount { group.addTask { let url = ... // URL to some image // Perform OCR let _ = await performOCRRequest(on: url: url) } nextIndex += 1 } } } } // MARK: - ASYNC/AWAIT version with iOS 18 @available(iOS 18, *) func performOCRRequest(on url: URL) async throws -> [RecognizedText] { // Create request var request = RecognizeTextRequest() // Single request: no need for ImageRequestHandler // Configure request request.recognitionLevel = .accurate request.automaticallyDetectsLanguage = true request.usesLanguageCorrection = true request.minimumTextHeightFraction = 0.016 // Perform request let textObservations: [RecognizedTextObservation] = try await request.perform(on: url) // Convert [RecognizedTextObservation] to [RecognizedText] return textObservations.compactMap { observation in observation.topCandidates(1).first } } I also found this Swift forums post mentioning something very similar. I also opened a feedback: FB17240843
7
0
241
Aug ’25
Vision Framework - Testing RecognizeDocumentsRequest
How do I test the new RecognizeDocumentRequest API. Reference: https://www.youtube.com/watch?v=H-GCNsXdKzM I am running Xcode Beta, however I only have one primary device that I cannot install beta software on. Please provide a strategy for testing. Will simulator work? The new capability is critical to my application, just what I need for structuring document scans and extraction. Thank you.
1
0
195
Jun ’25
BNNS random number generator for Double value types
I generate an array of random floats using the code shown below. However, I would like to do this with Double instead of Float. Are there any BNNS random number generators for double values, something like BNNSRandomFillUniformDouble? If not, is there a way I can convert BNNSNDArrayDescriptor from float to double? import Accelerate let n = 100_000_000 let result = Array<Float>(unsafeUninitializedCapacity: n) { buffer, initCount in var descriptor = BNNSNDArrayDescriptor(data: buffer, shape: .vector(n))! let randomGenerator = BNNSCreateRandomGenerator(BNNSRandomGeneratorMethodAES_CTR, nil) BNNSRandomFillUniformFloat(randomGenerator, &descriptor, 0, 1) initCount = n }
3
0
110
Jun ’25
AI-Powered Feed Customization via User-Defined Algorithm
Hey guys 👋 I’ve been thinking about a feature idea for iOS that could totally change the way we interact with apps like Twitter/X. Imagine if we could define our own recommendation algorithm, and have an AI on the iPhone that replaces the suggested tweets in the feed with ones that match our personal interests — based on public tweets, and without hacking anything. Kinda like a personalized "AI skin" over the app that curates content you actually care about. Feels like this would make content way more relevant and less algorithmically manipulative. Would love to know what you all think — and if Apple could pull this off 🔥
1
0
69
Jun ’25
Request for Agentic AI Mode (MCP Protocol) Support in Future Versions of iOS or Xcode
Hello Apple Team, Thank you for the recent Group Lab and for your continued work on advancing Xcode and developer tools. I’d like to submit a feature request: Are there any plans to introduce support for Agentic AI Mode (MCP protocol) in future versions of iOS or Xcode? As developer tools evolve toward more intelligent and context-aware environments, the integration of agentic AI capabilities could significantly enhance productivity and unlock new creative workflows. Looking forward to your consideration, and thank you again for the excellent session. Best regards
3
0
170
Jun ’25
Real Time Text detection using iOS18 RecognizeTextRequest from video buffer returns gibberish
Hey Devs, I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project. This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong. @Observable class CameraManager: NSObject, AVCapturePhotoCaptureDelegate ... override init() { super.init() setUpVisionRequest() } private func setUpVisionRequest() { textRequest = RecognizeTextRequest(.revision3) } ... func setup() -> Bool { captureSession.beginConfiguration() guard let captureDevice = AVCaptureDevice.default( .builtInWideAngleCamera, for: .video, position: .back) else { return false } self.captureDevice = captureDevice guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else { return false } /// Check whether the session can add input. guard captureSession.canAddInput(deviceInput) else { print("Unable to add device input to the capture session.") return false } /// Add the input and output to session captureSession.addInput(deviceInput) /// Configure the video data output videoDataOutput.setSampleBufferDelegate( self, queue: videoDataOutputQueue) if captureSession.canAddOutput(videoDataOutput) { captureSession.addOutput(videoDataOutput) videoDataOutput.connection(with: .video)? .preferredVideoStabilizationMode = .off } else { return false } // Set zoom and autofocus to help focus on very small text do { try captureDevice.lockForConfiguration() captureDevice.videoZoomFactor = 2 captureDevice.autoFocusRangeRestriction = .near captureDevice.unlockForConfiguration() } catch { print("Could not set zoom level due to error: \(error)") return false } captureSession.commitConfiguration() // potential issue with background vs dispatchqueue ?? Task(priority: .background) { captureSession.startRunning() } return true } } // Issue here ??? extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate { func captureOutput( _ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection ) { guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } Task { textRequest.recognitionLevel = .fast textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")] do { let observations = try await textRequest.perform(on: pixelBuffer) for observation in observations { let recognizedText = observation.topCandidates(1).first print("recognized text \(recognizedText)") } } catch { print("Recognition error: \(error.localizedDescription)") } } } } The results I get look like this ( full page of English from a any book) recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5)) recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3)) recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3)) recognized text Optional(RecognizedText(string: li, confidence: 0.3)) recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3)) recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3)) recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5)) recognized text Optional(RecognizedText(string: 1141, confidence: 0.3)) recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3)) recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3)) recognized text Optional(RecognizedText(string: ktril, confidence: 0.3)) recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3)) recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3)) recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3)) Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish. Any help would be amazing thank you. Am I using the buffer right ? Am I using the new perform(on: CVPixelBuffer) right ? Maybe I didn't set up my camera properly? I can provide code
1
0
333
Jul ’25
LLM size for fine-tuning using MLX in MacBook
Hi, recently i tried to fine-tune Gemma-2-2b mlx model on my macbook (24 GB UMA). The code started running, after few seconds i saw swap size reaching 50GB and ram around 23 GB and then it stopped. I ran the Gemma-2-2b (cuda) on colab, it ran and occupied 27 GB on A100 gpu and worked fine. Here i didn't experienced swap issue. Now my question is if my UMA was more than 27 GB, i also would not have experienced swap disk issue. Thanks.
1
0
323
Oct ’25
Core ML Model performance far lower on iOS 17 vs iOS 16 (iOS 17 not using Neural Engine)
Hello, I posted an issue on the coremltools GitHub about my Core ML models not performing as well on iOS 17 vs iOS 16 but I'm posting it here just in case. TL;DR The same model on the same device/chip performs far slower (doesn't use the Neural Engine) on iOS 17 compared to iOS 16. Longer description The following screenshots show the performance of the same model (a PyTorch computer vision model) on an iPhone SE 3rd gen and iPhone 13 Pro (both use the A15 Bionic). iOS 16 - iPhone SE 3rd Gen (A15 Bioinc) iOS 16 uses the ANE and results in fast prediction, load and compilation times. iOS 17 - iPhone 13 Pro (A15 Bionic) iOS 17 doesn't seem to use the ANE, thus the prediction, load and compilation times are all slower. Code To Reproduce The following is my code I'm using to export my PyTorch vision model (using coremltools). I've used the same code for the past few months with sensational results on iOS 16. # Convert to Core ML using the Unified Conversion API coreml_model = ct.convert( model=traced_model, inputs=[image_input], outputs=[ct.TensorType(name="output")], classifier_config=ct.ClassifierConfig(class_names), convert_to="neuralnetwork", # compute_precision=ct.precision.FLOAT16, compute_units=ct.ComputeUnit.ALL ) System environment: Xcode version: 15.0 coremltools version: 7.0.0 OS (e.g. MacOS version or Linux type): Linux Ubuntu 20.04 (for exporting), macOS 13.6 (for testing on Xcode) Any other relevant version information (e.g. PyTorch or TensorFlow version): PyTorch 2.0 Additional context This happens across "neuralnetwork" and "mlprogram" type models, neither use the ANE on iOS 17 but both use the ANE on iOS 16 If anyone has a similar experience, I'd love to hear more. Otherwise, if I'm doing something wrong for the exporting of models for iOS 17+, please let me know. Thank you!
1
1
1.8k
Mar ’25
NLModel won't initialize in MessageFilterExtension
i'm trying to create an NLModel within a MessageFilterExtension handler. The code works fine in the main app, but when I try to use it in the extension it fails to initialize. Just this doesn't even work and gets the error below. Single line that fails. SMS_Classifier is the class xcode generated for my model. This line works fine in the main app. let mlModel = try SMS_Classifier(configuration: MLModelConfiguration()).model Error Unable to locate Asset for contextual word embedding model for local en. MLModelAsset: load failed with error Error Domain=com.apple.CoreML Code=0 "initialization of text classifier model with model data failed" UserInfo={NSLocalizedDescription=initialization of text classifier model with model data failed} Any ideas?
3
1
1k
Jan ’25
Is it possible to read japanese tategaki with vision framework
We are building an app which can reads texts. It can read english and Japanese normal texts successfully. But in some cases, we need to read Japanese tategaki (vertically aligned texts). But in that times, the same code gives no output. So, is there any need to change any configuration to read Japanese tategaki? Or is it really possible to read Japanese tategaki using vision framework? lazy var detectTextRequest = VNRecognizeTextRequest { request, error in self.resStr="\n" self.words = [:] // Get OCR result guard let res = request.results as? [VNRecognizedTextObservation] else { return } // separate the words by space let text = res.compactMap({$0.topCandidates(1).first?.string}).joined(separator: " ") var n = 0 self.wordArr=[[]] self.xs = 1 self.ys = 1 var hs = 0.0 // To compare the heights of the words // To get the original axis (top most word's axis), only once for r in res { var word = r.topCandidates(1).first?.string self.words[word ?? ""] = [r.topLeft.x, r.topLeft.y] if(self.cartLabelType == 1){ if(word?.components(separatedBy: CharacterSet(charactersIn: "//")).count ?? 0>2){ self.xs = r.topLeft.x self.ys = r.topLeft.y } } } } }
2
1
611
Jan ’25