Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.

All subtopics
Posts under Machine Learning & AI topic

Post

Replies

Boosts

Views

Activity

A Summary of the WWDC25 Group Lab - Machine Learning and AI Frameworks
At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Machine Learning and AI Frameworks. What are you most excited about in the Foundation Models framework? The Foundation Models framework provides access to an on-device Large Language Model (LLM), enabling entirely on-device processing for intelligent features. This allows you to build features such as personalized search suggestions and dynamic NPC generation in games. The combination of guided generation and streaming capabilities is particularly exciting for creating delightful animations and features with reliable output. The seamless integration with SwiftUI and the new design material Liquid Glass is also a major advantage. When should I still bring my own LLM via CoreML? It's generally recommended to first explore Apple's built-in system models and APIs, including the Foundation Models framework, as they are highly optimized for Apple devices and cover a wide range of use cases. However, Core ML is still valuable if you need more control or choice over the specific model being deployed, such as customizing existing system models or augmenting prompts. Core ML provides the tools to get these models on-device, but you are responsible for model distribution and updates. Should I migrate PyTorch code to MLX? MLX is an open-source, general-purpose machine learning framework designed for Apple Silicon from the ground up. It offers a familiar API, similar to PyTorch, and supports C, C++, Python, and Swift. MLX emphasizes unified memory, a key feature of Apple Silicon hardware, which can improve performance. It's recommended to try MLX and see if its programming model and features better suit your application's needs. MLX shines when working with state-of-the-art, larger models. Can I test Foundation Models in Xcode simulator or device? Yes, you can use the Xcode simulator to test Foundation Models use cases. However, your Mac must be running macOS Tahoe. You can test on a physical iPhone running iOS 18 by connecting it to your Mac and running Playgrounds or live previews directly on the device. Which on-device models will be supported? any open source models? The Foundation Models framework currently supports Apple's first-party models only. This allows for platform-wide optimizations, improving battery life and reducing latency. While Core ML can be used to integrate open-source models, it's generally recommended to first explore the built-in system models and APIs provided by Apple, including those in the Vision, Natural Language, and Speech frameworks, as they are highly optimized for Apple devices. For frontier models, MLX can run very large models. How often will the Foundational Model be updated? How do we test for stability when the model is updated? The Foundation Model will be updated in sync with operating system updates. You can test your app against new model versions during the beta period by downloading the beta OS and running your app. It is highly recommended to create an "eval set" of golden prompts and responses to evaluate the performance of your features as the model changes or as you tweak your prompts. Report any unsatisfactory or satisfactory cases using Feedback Assistant. Which on-device model/API can I use to extract text data from images such as: nutrition labels, ingredient lists, cashier receipts, etc? Thank you. The Vision framework offers the RecognizeDocumentRequest which is specifically designed for these use cases. It not only recognizes text in images but also provides the structure of the document, such as rows in a receipt or the layout of a nutrition label. It can also identify data like phone numbers, addresses, and prices. What is the context window for the model? What are max tokens in and max tokens out? The context window for the Foundation Model is 4,096 tokens. The split between input and output tokens is flexible. For example, if you input 4,000 tokens, you'll have 96 tokens remaining for the output. The API takes in text, converting it to tokens under the hood. When estimating token count, a good rule of thumb is 3-4 characters per token for languages like English, and 1 character per token for languages like Japanese or Chinese. Handle potential errors gracefully by asking for shorter prompts or starting a new session if the token limit is exceeded. Is there a rate limit for Foundation Models API that is limited by power or temperature condition on the iPhone? Yes, there are rate limits, particularly when your app is in the background. A budget is allocated for background app usage, but exceeding it will result in rate-limiting errors. In the foreground, there is no rate limit unless the device is under heavy load (e.g., camera open, game mode). The system dynamically balances performance, battery life, and thermal conditions, which can affect the token throughput. Use appropriate quality of service settings for your tasks (e.g., background priority for background work) to help the system manage resources effectively. Do the foundation models support languages other than English? Yes, the on-device Foundation Model is multilingual and supports all languages supported by Apple Intelligence. To get the model to output in a specific language, prompt it with instructions indicating the user's preferred language using the locale API (e.g., "The user's preferred language is en-US"). Putting the instructions in English, but then putting the user prompt in the desired output language is a recommended practice. Are larger server-based models available through Foundation Models? No, the Foundation Models API currently only provides access to the on-device Large Language Model at the core of Apple Intelligence. It does not support server-side models. On-device models are preferred for privacy and for performance reasons. Is it possible to run Retrieval-Augmented Generation (RAG) using the Foundation Models framework? Yes, it is possible to run RAG on-device, but the Foundation Models framework does not include a built-in embedding model. You'll need to use a separate database to store vectors and implement nearest neighbor or cosine distance searches. The Natural Language framework offers simple word and sentence embeddings that can be used. Consider using a combination of Foundation Models and Core ML, using Core ML for your embedding model.
1
0
1.3k
Jun ’25
Run Time Issues with Swift/Core ML
Hello! I have a swift program that tracks the location of a ball (through the back camera). It seems to be working fine, but the only issue is the run time, particularly my concatenate, normalize, and argmax functions, which are meant to be a 1 to 1 copy of the PyTorch argmax function and the following python lines: imgs = np.concatenate((img, img_prev, img_preprev), axis=2) imgs = imgs.astype(np.float32)/255.0 imgs = np.rollaxis(imgs, 2, 0) inp = np.expand_dims(imgs, axis=0) # used to pass into model However, I need my program to run in real time and in an ideal world, I want it to run way under real time. Below is a run down of the run times that result from my code: Starting model inference Setup took: 0.0 seconds Resize took: 0.03741896152496338 seconds Concatenation took: 0.3359949588775635 seconds Normalization took: 0.9906361103057861 seconds Model prediction took: 0.3425499200820923 seconds Argmax took: 28.17007803916931 seconds Postprocess took: 0.054128050804138184 seconds Model inference took 29.934185028076172 seconds Here are the concatenateBuffers, normalizeBuffers, and argmax functions that I use: func concatenateBuffers(_ buffers: [CVPixelBuffer?]) -> CVPixelBuffer? { guard buffers.count == 3, let first = buffers[0] else { return nil } let width = CVPixelBufferGetWidth(first) let height = CVPixelBufferGetHeight(first) let targetChannels = 9 var concatenated: CVPixelBuffer? let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue] as CFDictionary CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32BGRA, attrs, &concatenated) guard let output = concatenated else { return nil } CVPixelBufferLockBaseAddress(output, []) defer { CVPixelBufferUnlockBaseAddress(output, []) } guard let outputData = CVPixelBufferGetBaseAddress(output) else { return nil } let outputPtr = UnsafeMutablePointer<UInt8>(OpaquePointer(outputData)) // Lock all input buffers at once buffers.forEach { buffer in guard let buffer = buffer else { return } CVPixelBufferLockBaseAddress(buffer, .readOnly) } defer { buffers.forEach { CVPixelBufferUnlockBaseAddress($0!, .readOnly) } } // Process each input buffer for (frameIdx, buffer) in buffers.enumerated() { guard let buffer = buffer, let inputData = CVPixelBufferGetBaseAddress(buffer) else { continue } let inputPtr = UnsafePointer<UInt8>(OpaquePointer(inputData)) let bytesPerRow = CVPixelBufferGetBytesPerRow(buffer) let totalPixels = width * height // Process all pixels in one go for this frame for i in 0..<totalPixels { let y = i / width let x = i % width let inputOffset = y * bytesPerRow + x * 4 let outputOffset = i * targetChannels + frameIdx * 3 // BGR order to match numpy outputPtr[outputOffset] = inputPtr[inputOffset + 2] // B outputPtr[outputOffset + 1] = inputPtr[inputOffset + 1] // G outputPtr[outputOffset + 2] = inputPtr[inputOffset] // R } } return output } func normalizeBuffer(_ buffer: CVPixelBuffer?) -> MLMultiArray? { guard let input = buffer else { return nil } let width = CVPixelBufferGetWidth(input) let height = CVPixelBufferGetHeight(input) let channels = 9 CVPixelBufferLockBaseAddress(input, .readOnly) defer { CVPixelBufferUnlockBaseAddress(input, .readOnly) } guard let inputData = CVPixelBufferGetBaseAddress(input) else { return nil } let shape = [1, NSNumber(value: channels), NSNumber(value: height), NSNumber(value: width)] guard let output = try? MLMultiArray(shape: shape, dataType: .float32) else { return nil } let inputPtr = inputData.assumingMemoryBound(to: UInt8.self) let bytesPerRow = CVPixelBufferGetBytesPerRow(input) let ptr = UnsafeMutablePointer<Float>(OpaquePointer(output.dataPointer)) let totalSize = width * height for c in 0..<channels { for idx in 0..<totalSize { let h = idx / width let w = idx % width let inputIdx = h * bytesPerRow + w * channels + c ptr[c * totalSize + idx] = Float(inputPtr[inputIdx]) / 255.0 } } return output } func argmax(_ array: MLMultiArray) -> MLMultiArray? { let shape = array.shape.map { $0.intValue } guard shape.count == 3, shape[0] == 1, shape[1] == 256, shape[2] == 230400 else { return nil } guard let output = try? MLMultiArray(shape: [1, NSNumber(value: 230400)], dataType: .int32) else { return nil } let ptr = UnsafePointer<Float>(OpaquePointer(array.dataPointer)) let outputPtr = UnsafeMutablePointer<Int32>(OpaquePointer(output.dataPointer)) let channelSize = 230400 for pos in 0..<230400 { var maxValue = -Float.infinity var maxIndex: Int32 = 0 for channel in 0..<256 { let value = ptr[channel * channelSize + pos] if value > maxValue { maxValue = value maxIndex = Int32(channel) } } outputPtr[pos] = maxIndex } return output } Are there any glaring areas of inefficiencies that can be reduced to allow for under real time processing whilst following the same logic as found in the python code exactly? Would using Obj-C speed things up for some reason? Are there any tools I can use so I don't have to write these functions myself? Additionally, in the classes init, function, I tried to check the compute units being used since I feel 0.34 seconds for a singular model prediction is also far too long, but no print statements are showing for some reason: init() { guard let loadedModel = try? BallTrackerModel() else { fatalError("Could not load model") } let config = MLModelConfiguration() config.computeUnits = .all guard let configuredModel = try? BallTrackerModel(configuration: config) else { fatalError("Could not configure model") } self.model = configuredModel print("model loaded with compute units \(config.computeUnits.rawValue)") } Thanks!
3
0
793
Feb ’25
How does the extract method from ImagePlaygroundConcept work?
I’m building an app that generates images based on text input from a specific text field. However, I’m encountering a problem: For short prompts like "a cat and a dog", the entire string is sent to the Image Playground, even when I use the extracted method. For longer inputs, the behavior is inconsistent. Sometimes it extracts keywords correctly, but other times it doesn’t extract anything at all. Since my app relies on generating images based on the extracted keywords, this inconsistency negatively impacts the user experience in my app. How can I make sure that keywords are always extracted from the input string? Button("Generate", systemImage: "apple.intelligence") { isPresented = true } .imagePlaygroundSheet(isPresented: $isPresented, concepts: [ImagePlaygroundConcept.extracted(from: text, title: textTitle)]) { url in imageURL = url }
1
0
563
Feb ’25
Easy way to implement facial recognition
Hi everyone😊, I want to implement facial recognition into my app. I was planning to use createML's image classification, but there seams to be a lot of hassle to implement (the JSON file etc.). Are there some other easy to implement options that don't involve advanced coding. Thanks, Oliver
2
0
612
Feb ’25
tensorflow-metal for Python3.12 and tensorflow 2.17.x
Hi, The most recent version of tensorflow-metal is only available for macosx 12.0 and python up to version 3.11. Is there any chance it could be updated with wheels for macos 15 and Python 3.12 (which is the default version supported for tensrofllow 2.17+)? I'd note that even downgrading to Python 3.11 would not be sufficient, as the wheels only work for macos 12. Thanks.
5
8
2.4k
Feb ’25
Inquiry About GS1 DataBar Stacked Support in Vision Framework
Hello, I am currently developing an application that requires barcode scanning using Apple’s Vision framework (VNBarcodeSymbology). I noticed that the framework supports several GS1 DataBar symbologies, such as: VNBarcodeSymbology.gs1DataBar VNBarcodeSymbology.gs1DataBarExpanded VNBarcodeSymbology.gs1DataBarLimited However, I could not find any explicit reference to support for GS1 DataBar Stacked (both regular and expanded variants). Could you confirm whether GS1 DataBar Stacked is currently supported in VisionKit's DataScannerViewController or VNBarcodeObservation? If not, are there any plans to include support for this symbology in a future iOS update? This functionality is critical for my use case, as GS1 DataBar Stacked barcodes are widely used in retail, pharmaceuticals, and logistics, where space constraints prevent the use of standard GS1 DataBar formats. I appreciate any clarification on this matter and would be happy to provide additional details if needed.
0
0
423
Feb ’25
CoreML inference on iOS HW uses only CPU on CoreMLTools imported Pytorch model
I have exported a Pytorch model into a CoreML mlpackage file and imported the model file into my iOS project. The model is a Music Source Separation model - running prediction on audio-spectrogram blocks and returning separated audio source spectrograms. Model produces correct results vs. desktop+GPU+Python but the inference on iPhone 15 Pro Max is really, really slow. Using Xcode model Performance tool I can see that the inference isn't automatically managed between compute units - all of it runs on CPU. The Performance tool notation hints all that ops should be supported by both the GPU and Neural Engine. One thing to note, that when initializing the model with MLModelConfiguration option .cpuAndGPU or .cpuAndNeuralEngine there is an error in Xcode console: `Error(s) occurred compiling MIL to BNNS graph: [CreateBnnsGraphProgramFromMIL]: Failed to determine convolution kernel at location at /private/var/containers/Bundle/Application/2E3C4AFF-1FA4-4C95-AAE4-ECEBC0FB0BF9/mymss.app/mymss.mlmodelc/model.mil:2453:12 @ CreateBnnsGraphProgramFromMIL` Before going back hammering the model in Python, are there any tips/strategies I could try in CoreMLTools export phase or in configuring the model for prediction on iOS? My export toolchain is currently Linux with CoreMLTools v8.1, export target iOS16.
2
0
782
Feb ’25
Training Images for Vision Classifier Model - Swift Student Challenge
I'm working on my Swift Student Challenge submission and developing a Vision framework-based image classifier. I want to ensure I'm following best practices for training data and follow to guidelines for what images I use to train my image classifier. What types of images can I use for training my model? Are there specific image databases or resources recommended by Apple that are safe to use for Swift Student Challenge submissions? Currently considering images used from wikipedia, and my own images
1
0
502
Feb ’25
Broken compatibility in tensorflow-metal with tensorflow 2.18
Issue type: Bug TensorFlow metal version: 1.1.1 TensorFlow version: 2.18 OS platform and distribution: MacOS 15.2 Python version: 3.11.11 GPU model and memory: Apple M2 Max GPU 38-cores Standalone code to reproduce the issue: import tensorflow as tf if __name__ == '__main__': gpus = tf.config.experimental.list_physical_devices('GPU') print(gpus) Current behavior Apple silicone GPU with tensorflow-metal==1.1.0 and python 3.11 works fine with tensorboard==2.17.0 This is normal output: /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')] Process finished with exit code 0 But if I upgrade tensorflow to 2.18 I'll have error: /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/bin/python /Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py Traceback (most recent call last): File "/Users/mspanchenko/VSCode/cryptoNN/ml/core_second_window/test_tensorflow_gpus.py", line 1, in <module> import tensorflow as tf File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/__init__.py", line 437, in <module> _ll.load_library(_plugin_dir) File "/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Symbol not found: __ZN3tsl8internal10LogMessageC1EPKcii Referenced from: <D2EF42E3-3A7F-39DD-9982-FB6BCDC2853C> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow-plugins/libmetal_plugin.dylib Expected in: <2814A58E-D752-317B-8040-131217E2F9AA> /Users/mspanchenko/anaconda3/envs/cryptoNN_ml_core/lib/python3.11/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so Process finished with exit code 1
3
3
1.7k
Feb ’25
Feature Request – Support for GS1 DataBar Stacked in Vision Framework
Dear Apple Developer Team, I am writing to request the addition of GS1 DataBar Stacked (both regular and expanded variants) to the barcode symbologies supported by the Vision framework (VNBarcodeSymbology) and VisionKit's DataScannerViewController. Currently, Vision supports several GS1 DataBar formats, such as: VNBarcodeSymbology.gs1DataBar VNBarcodeSymbology.gs1DataBarExpanded VNBarcodeSymbology.gs1DataBarLimited However, GS1 DataBar Stacked is widely used in industries such as retail, pharmaceuticals, and logistics, where space constraints prevent the use of the standard GS1 DataBar format. Many businesses rely on this symbology to encode GTINs and other product data, but Apple's barcode scanning API does not explicitly support it. Why This Feature Matters: Essential for Small Packaging: GS1 DataBar Stacked is commonly used on small product labels where a standard linear barcode does not fit. Widespread Industry Adoption: Many point-of-sale (POS) systems and inventory management tools require this symbology. Improves iOS Adoption for Enterprise Use: Adding support would make Apple’s Vision framework a more viable solution for businesses that currently rely on third-party barcode scanning SDKs. Feature Request: Please add GS1 DataBar Stacked and GS1 DataBar Expanded Stacked to the recognized symbologies in: VNBarcodeSymbology (for Vision framework) DataScannerViewController (for VisionKit) This addition would enhance the versatility of Apple’s barcode scanning tools and reduce the need for third-party libraries. I appreciate your consideration of this request and would be happy to provide more details or test implementations if needed. Thank you for your time and support! Best regards
2
5
691
Feb ’25
What special features does Apple officially have that use ML or AI?
I am a App designer and I am curious about what specific ML or AI Apple used to develop those features in the system. As far as I know, Apple's hand-raising detection, destination recommendations in maps, and exercise types in fitness all use ML. Are there more specific application examples of ML or AI? Does Apple have a document specifically introducing examples of specific applications of ML or AI technology in the system?
1
0
620
Feb ’25
Unexpectedly slow CreateML text classifier training (limited GPU/CPU usage)
While training a text classifier model with a few thousand samples completes in seconds, when using 100,000 or 1 million samples, CreateML's training time increases exponentially (to hours or days). During these hours/days, GPU usage is low and almost every CPU core is idle. When using the Swift APIs for model training, resource utilization does not increase. I'm using Xcode 16.2, macOS 15.2 on either an M2 Ultra 64 GB or an M3 Max 48 GB laptop (both using built-in SSD with ~500 GB free) running no other applications. Is there a setting I've missed to allow training to take over more of my computing resources? Is this expected of CreateML (i.e., when looking to exploit a larger corpus, I should move to other tooling)? I'd love to speed up my iteration cycle time.
1
0
670
Feb ’25
Apple Intelligence stuck on "preparing" for 6 days.
On the October 10/28 release day of Apple Intelligence I opted in. My iPhone and iPad immediately went to "waitlist" and within 2 to 3 hours were ready to initialize Apple Intelligence. My MacBook Pro 14" with M3 Pro processor and 18 GB or RAM has been stuck on "preparing" since release day (6 days now). I've tried numerous workarounds that I found on forums as well as talking to Apple support, who basically had me repeat the workarounds that I found on forums. I've tried changing region to an area that does not have Apple Intelligence and then back to the US, I've changed Siri language to an unsupported one and back to a supported one, and I have tried disabling background/startup Apps, I've disabled and reenabled Siri. Oh, I've restarted a bunch and let the Mac alone for hours at a time. I've noticed that my selected Siri voice seems to not download. Finally, after several chats and calls with Apple support, I was told that it's Beta software, they can't help me, and I should try the developer forums.... so here I am. Any advice?
9
1
3.1k
Feb ’25
The yolo11 object detection model I exported to coreml stopped working in macOS15.2 beta.
After updating to macOS15.2beta, the Yolo11 object detection model exported to coreml outputs incorrect and abnormal bounding boxes. It also doesn't work in iOS apps built on a 15.2 mac. The same model worked fine on macOS14.1. When training a Yolo11 custom model in Python, exporting it to coreml, and testing it in the preview tab of mlpackage on macOS15.2 and Xcode16.0, the above result is obtained.
6
1
1.5k
Feb ’25
Issues with using ClassifyImageRequest() on an Xcode simulator
Hello, I am developing an app for the Swift Student challenge; however, I keep encountering an error when using ClassifyImageRequest from the Vision framework in Xcode: VTEST: error: perform(_:): inside 'for await result in resultStream' error: internalError("Error Domain=NSOSStatusErrorDomain Code=-1 \"Failed to create espresso context.\" UserInfo={NSLocalizedDescription=Failed to create espresso context.}") It works perfectly when testing it on a physical device, and I saw on another thread that ClassifyImageRequest doesn't work on simulators. Will this cause problems with my submission to the challenge? Thanks
5
1
807
Feb ’25
Efficient Clustering of Images Using VNFeaturePrintObservation.computeDistance
Hi everyone, I'm working with VNFeaturePrintObservation in Swift to compute the similarity between images. The computeDistance function allows me to calculate the distance between two images, and I want to cluster similar images based on these distances. Current Approach Right now, I'm using a brute-force approach where I compare every image against every other image in the dataset. This results in an O(n^2) complexity, which quickly becomes a bottleneck. With 5000 images, it takes around 10 seconds to complete, which is too slow for my use case. Question Are there any efficient algorithms or data structures I can use to improve performance? If anyone has experience with optimizing feature vector clustering or has suggestions on how to scale this efficiently, I'd really appreciate your insights. Thanks!
0
0
540
Feb ’25
Swift playgrounds (.swiftpm) and CoreML
Hey guys, I've been having difficulties transferring my Xcode project to a Swift playground (.swiftpm) for the Swift Student Challenge. I keep getting these errors as well as none of the views being able to find the model in scope: "TrashDetector 1.mlmodel: No predominant language detected. Set COREML_CODEGEN_LANGUAGE to preferred language." Unexpected duplicate tasks: Target 'TrashQuest' (project 'TrashQuest') has write command with output /Users/kmcph3/Library/Developer/Xcode/DerivedData/TrashQuest-glvzskunedgtakfrdmsxdoplondj/Build/Intermediates.noindex/TrashQuest.build/Debug-iphonesimulator/TrashQuest.build/0a4ef2429d66360920ddb4f16e65e233.sb I've gone through multiple post with these exact problems, but they all seem to be talking about ".playground" files due to the "Resources" folder (mind you I did try exactly what they said). Is there anyone that can help??? (Quick side note, why does it need to be a swiftpm file for the SSC??? Like why can't we just send the zip of our Xcode project??)
2
0
850
Feb ’25