Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Posts under Metal tag

148 Posts

Post

Replies

Boosts

Views

Activity

Metal toolchain compilation error after migration to MacOS 26 and XCode 26.1
Hello. I migrated yesterday two machines from OS15 to OS26 (same Ram; M1Max chip). On the first one, a MBP, everything is allright, and my old Metal projects can compile and run, even with MacOS12 as a destination. On the second computer (MacStudio) I got a compile "error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain". I spent hours on many forums and tried all proposed solutions and still get the same error. Any idea? Thanks
2
0
88
12h
CoreVideo + Rosetta still clamps at 60Hz (since macOS 12)
We set the CVDisplayLink on macOS to 0 or 120, and get the following. This then clamps maximum refresh to 60Hz on the 120Hz ProMotion display on a MBP M2 Max laptop. How is this not fixed in 4 macOS releases? CoreVideo: currentVBLDelta returned 200000 for display 1 -- ignoring unreasonable value CoreVideo: [0x7fe2fb816020] Bad CurrentVBLDelta for display 1 is zero. defaulting to 60Hz.
3
0
421
20h
NSScreen's maximumExtendedDynamicRangeColorComponentValue does not seem to provide the proper value after sleep/wake on third party HDR displays even when there is EDR content on screen in macOS Tahoe
The maximumExtendedDynamicRangeColorComponentValue should provide some value between 1.0 and maximumPotentialExtendedDynamicRangeColorComponentValue depending on the available EDR headroom if there is any content on-screen that uses EDR. This works fine in most scenarios but in macOS 26 Tahoe (including in 26.2) this seemingly breaks down when a third party external display is in HDR mode and the Mac goes to sleep and wakes up. After wake only a value of 1.0 is provided by the third party external display's NSScreen object, no matter what (although when the SDR peak brightness is being changed using the brightness slider, didChangeScreenParametersNotification is firing and the system should provide a proper updated headroom value). This makes dynamic tone-mapping that adapts to actual screen brightness impossible. Everything works fine in Sequoia. In Tahoe the user needs to turn off HDR, then go through a sleep/wake cycle and turn HDR back on to have this fixed, which is obviously not a sustainable workaround.
1
0
128
22h
How to apply a geometry modifier to a VideoMaterial in RealityKit?
I have a model that uses a video material as the surface shader and I need to also use a geometry modifier on the material. This seemed like it would be promising (adapted from https://developer.apple.com/wwdc21/10075 ~5m 50s). // Did the setup for the video and AVPlayer eventually leading me to let videoMaterial = VideoMaterial(avPlayer: avPlayer) // Assign the material to the entity entity.model!.materials = [videoMaterial] // The part shown in WWDC: Set up the library and geometry modifier before, so now try to map the new custom material to the video material entity.model!.materials = entity.model!.materials.map { baseMaterial in       try! CustomMaterial(from: baseMaterial, geometryModifier: geometryModifier)     } But, I get the following error Thread 1: Fatal error: 'try!' expression unexpectedly raised an error: RealityFoundation.CustomMaterialError.defaultSurfaceShaderForMaterialNotFound How can I apply a geometry modifier to a VideoMaterial? Or, if I can't do that, is there an easy way to route the AVPlayer video data into the baseColor of CustomMaterial?
2
0
1.1k
3d
Changing Frame Rate of External Display on iPad
Hello, As far as I know and in all of my testing there is no way for a user or a developer to change the frame rate of the video output on iPadOS. If you connect an iPad via a USB Hub or a USB to HDMI Adaptor and then connect it to an external monitor it will output at 59.94fps. I have a video app where a user monitors live video at 25fps and 30fps, they often output to an external display and there are times when the external display will stutter due to the mismatch in frame rate, ie. using 25fps and outputting at 59.94fps. I thought it was impossible to change the video output frame rate, then in V3.1 of the Blackmagic Camera App I saw an interesting change in their release notes: ‘Support for HDMI Monitoring at Sensor Rate and Resolution’ This means there is some way to modify it, not sure if this is done via a Private API that Apple has allowed Blackmagic to use. If so, how can we access this or is there a way to enable this that is undocumented? Thanks!
5
0
666
4d
Metal: Intersection results unstable when reusing Instance Acceleration Structures
Hi all, I'm encountering an issue with Metal raytracing on my M5 MacBook Pro regarding Instance Acceleration Structure (IAS). Intersection tests suddenly stop working after a certain point in the sampling loop. Situation I implemented an offline GPU path tracer that runs the same kernel multiple times per pixel (sampleCount) using metal::raytracing. Intersection tests are performed using an IAS. Since this is an offline path tracer, geometries inside the IAS never changes across samples (no transforms or updates). As sampleCount increases, there comes a point where the number of intersections drops to zero, and remains zero for all subsequent samples. Here's a code sketch: let sampleCount: UInt16 = 1024 for sampleIndex: UInt16 in 0..<sampleCount { // ... do { let commandBuffer = commandQueue.makeCommandBuffer() // Dispatch the intersection kernel. await commandBuffer.completed() } do { let commandBuffer = commandQueue.makeCommandBuffer() // Use the intersection test results from the previous command buffer. await commandBuffer.completed() } // ... } kernel void intersectAlongRay( const metal::uint32_t threadIndex [[thread_position_in_grid]], // ... const metal::raytracing::instance_acceleration_structure accelerationStructure [[buffer(2)]], // ... ) { // ... const auto result = intersector.intersect(ray, accelerationStructure); switch (result.type) { case metal::raytracing::intersection_type::triangle: { // Write intersection result to device buffers. break; } default: break; } Observations Encoding both the intersection kernel and the subsequent result usage in the same command buffer does not resolve the problem. Switching from IAS to Primitive Acceleration Structure (PAS) fixes the problem. Rebuilding the IAS for each sample also resolves the issue. Intersections produce inconsistent results even though the IAS and rays are identical — Image 1 shows a hit, while Image 2 shows a miss. Questions Am I misusing IAS in some way ? Could this be a Metal bug ? Any guidance or confirmation would be greatly appreciated.
1
0
265
5d
Race conditions when changing CAMetalLayer.drawableSize?
Is the pseudocode below thread-safe? Imagine that the Main thread sets the CAMetalLayer's drawableSize to a new size meanwhile the rendering thread is in the middle of rendering into an existing MTLDrawable which does still have the old size. Is the change of metalLayer.drawableSize thread-safe in the sense that I can present an old MTLDrawable which has a different resolution than the current value of metalLayer.drawableSize? I assume that setting the drawableSize property informs Metal that the next MTLDrawable offered by the CAMetalLayer should have the new size, right? Is it valid to assume that "metalLayer.drawableSize = newSize" and "metalLayer.nextDrawable()" are internally synchronized, so it cannot happen that metalLayer.nextDrawable() would produce e.g. a MTLDrawable with the old width but with the new height (or a completely invalid resolution due to potential race conditions)? func onWindowResized(newSize: CGSize) { // Called on the Main thread metalLayer.drawableSize = newSize } func onVsync(drawable: MTLDrawable) { // Called on a background rendering thread renderer.renderInto(drawable: drawable) }
0
0
125
6d
App Freezes on iPadOS 26.x - GPU Metal Errors
I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users. I was able to reproduce this and grabbed the following error messages when the freeze happens: IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault) IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource) Environment: Qt 6.5.4 (Qt for iOS) Esri Maps SDK for Qt 200.3 iPadOS 26.1 Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api): QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL) Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple. I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update? Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now. Appreciate any thoughts/insights....
3
0
415
6d
iOS Metal system delayed one Vsync period to really display the frame on the screen
View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
3
0
481
1w
Metal Debugger only captures static on XCode 16.2
I was encountering visual artifacts in my Unity game only on iOS devices and wanted to use the Metal Debugger to diagnose it. However, it seems to only capture static noise which is not helpful as shown below For reference, this is a frame of the visual artifact and also the location where I asked Metal to capture the frame Am I missing any settings in Unity/Xcode?
4
0
192
1w
xCode26.x Metal4 classes do not compile
Hi, I am using xCode26.x. But my Metal4 classes are not compiling. I downloaded the sample code from Apple's website - https://developer.apple.com/documentation/Metal/processing-a-texture-in-a-compute-function. For example, I am getting errors like "Cannot find protocol declaration for 'MTL4CommandQueue'; I have hit a deadline. Any recommendations are very welcome. I have downloaded the Metal Tool chain. When I run the following commands on the terminal - xcodebuild -showComponent metalToolchain ; xcrun -f metal ; xcrun metal --version I get the following response - Asset Path: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/86fbaf7b114a899754307896c0bfd52ffbf4fded.asset/AssetData Build Version: 17A321 Status: installed Toolchain Identifier: com.apple.dt.toolchain.Metal.32023 Toolchain Search Path: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/bin/metal Apple metal version 32023.830 (metalfe-32023.830.2) Target: air64-apple-darwin24.6.0 Thread model: posix InstalledDir: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/metal/current/bin
1
0
547
2w
Getting CoreML to run inference on already allocated gpu buffers
I am running some experiments with WebGPU using the wgpu crate in rust. I have some Buffers already allocated in the GPU. Is it possible to use those already existing buffers directly as inputs to a predict call in CoreML? I want to prevent gpu to cpu download time as much as possible. Or are there any other ways to do something like this. Is this only possible using the latest Tensor object which came out with Metal 4 ?
0
0
414
2w
Metal 4 Argument Tables
I am puzzled by the setAddress(_:attributeStride:index:) of MTL4ArgumentTable. Can anyone please explain what the attributeStride parameter is for? The doc says that it is "The stride between attributes in the buffer." but why? Who uses this for what? On the C++ side in the shaders the stride is determined by the C++ type, as far as I know. What am I missing here? Thanks!
0
0
429
2w
tensorflow-metal fails with tensorflow > 2.18.1
Also submitted as feedback (ID: FB20612561). Tensorflow-metal fails on tensorflow versions above 2.18.1, but works fine on tensorflow 2.18.1 In a new python 3.12 virtual environment: pip install tensorflow pip install tensor flow-metal python -c "import tensorflow as tf" Prints error: Traceback (most recent call last): File "", line 1, in File "/Users//pt/venv/lib/python3.12/site-packages/tensorflow/init.py", line 438, in _ll.load_library(_plugin_dir) File "/Users//pt/venv/lib/python3.12/site-packages/tensorflow/python/framework/load_library.py", line 151, in load_library py_tf.TF_LoadLibrary(lib) tensorflow.python.framework.errors_impl.NotFoundError: dlopen(/Users//pt/venv/lib/python3.12/site-packages/tensorflow-plugins/libmetal_plugin.dylib, 0x0006): Library not loaded: @rpath/_pywrap_tensorflow_internal.so Referenced from: <8B62586B-B082-3113-93AB-FD766A9960AE> /Users//pt/venv/lib/python3.12/site-packages/tensorflow-plugins/libmetal_plugin.dylib Reason: tried: '/Users//pt/venv/lib/python3.12/site-packages/tensorflow-plugins/../_solib_darwin_arm64/_U@local_Uconfig_Utf_S_S_C_Upywrap_Utensorflow_Uinternal___Uexternal_Slocal_Uconfig_Utf/_pywrap_tensorflow_internal.so' (no such file), '/Users//pt/venv/lib/python3.12/site-packages/tensorflow-plugins/../_solib_darwin_arm64/_U@local_Uconfig_Utf_S_S_C_Upywrap_Utensorflow_Uinternal___Uexternal_Slocal_Uconfig_Utf/_pywrap_tensorflow_internal.so' (no such file), '/opt/homebrew/lib/_pywrap_tensorflow_internal.so' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/opt/homebrew/lib/_pywrap_tensorflow_internal.so' (no such file)
6
4
1.9k
2w
Deterministic RNG behaviour across Mac M1 CPU and Metal GPU – BigCrush pass & structural diagnostics
Hello, I am currently working on a research project under ENINCA Consulting, focused on advanced diagnostic tools for pseudorandom number generators (structural metrics, multi-seed stability, cross-architecture reproducibility, and complementary indicators to TestU01). To validate this diagnostic framework, I prototyped a small non-linear 64-bit PRNG (not as a goal in itself, but simply as a vehicle to test the methodology). During these evaluations, I observed something interesting on Apple Silicon (Mac M1): • bit-exact reproducibility between M1 ARM CPU and M1 Metal GPU, • full BigCrush pass on both CPU and Metal backends, • excellent p-values, • stable behaviour across multiple seeds and runs. This was not the intended objective, the goal was mainly to validate the diagnostic concepts, but these results raised some questions about deterministic compute behaviour in Metal. My question: Is there any official guidance on achieving (or expecting) deterministic RNG or compute behaviour across CPU ↔ Metal GPU on Apple Silicon? More specifically: • Are deterministic compute kernels expected or guaranteed on Metal for scientific workloads? • Are there recommended patterns or best practices to ensure reproducibility across GPU generations (M1 → M2 → M3 → M4)? • Are there known Metal features that can introduce non-determinism? I am not sharing the internal recurrence (this work is proprietary), but I can discuss the high-level diagnostic observations if helpful. Thank you for any insight, very interested in how the Metal engineering team views deterministic compute patterns on Apple Silicon. Pascal ENINCA Consulting
0
0
124
2w
CoreML regression between macOS 26.0.1 and macOS 26.1 Beta causing scrambled tensor outputs
We’ve encountered what appears to be a CoreML regression between macOS 26.0.1 and macOS 26.1 Beta. In macOS 26.0.1, CoreML models run and produce correct results. However, in macOS 26.1 Beta, the same models produce scrambled or corrupted outputs, suggesting that tensor memory is being read or written incorrectly. The behavior is consistent with a low-level stride or pointer arithmetic issue — for example, using 16-bit strides on 32-bit data or other mismatches in tensor layout handling. Reproduction Install ON1 Photo RAW 2026 or ON1 Resize 2026 on macOS 26.0.1. Use the newest Highest Quality resize model, which is Stable Diffusion–based and runs through CoreML. Observe correct, high-quality results. Upgrade to macOS 26.1 Beta and run the same operation again. The output becomes visually scrambled or corrupted. We are also seeing similar issues with another Stable Diffusion UNet model that previously worked correctly on macOS 26.0.1. This suggests the regression may affect multiple diffusion-style architectures, likely due to a change in CoreML’s tensor stride, layout computation, or memory alignment between these versions. Notes The affected models are exported using standard CoreML conversion pipelines. No custom operators or third-party CoreML runtime layers are used. The issue reproduces consistently across multiple machines. It would be helpful to know if there were changes to CoreML’s tensor layout, precision handling, or MLCompute backend between macOS 26.0.1 and 26.1 Beta, or if this is a known regression in the current beta.
5
3
1.6k
2w
How can I assign priorities to my app’s GPU workloads?
My app has a number of heterogeneous GPU workloads that all run concurrently. Some of these should be executed with the highest priority because the app’s responsiveness depends on them, while others are triggered by file imports and the like which should have a low priority. If this was running on the CPU I’d assign the former User Interactive QoS and the latter Utility QoS. Is there an equivalent to this for GPU work?
0
0
271
2w
I built apple.PHASE with Unity and targeted with visionOS, but Reverb does not sound.
Environment Versions ・macOS15.6.1 ・visionOS26.0.1 ・Xcode16.1 or 26.0.1 ・unity6000.2.9f1 ・Apple.core3.2.0 ・Apple.PHASE1.2.7 ・polyspatial2.4.2 With the above environment, after installing Apple.PHASE into unity and building to a visionOS device, Audio is available and distance attention works, but Early Reflection and Late Reverb produce no audible change even when checked and their parameters are adjusted. What is required to make Early Reflection and Late Reverb take effect on a visionOS device build? action taken ・created a SoundEvent. ・in composer, created a Sampler and a SpatialMixer; attached an AudioClip to the Sampler; enabled Direct Path, Early Reflection, and Late Reverb on the SpatialMixer. ・attached a PHASE Source to the object to be played, attached the created SoundEvent to it, and set non-zero values for Early Reflection and Late Reverb. ・attached a PHASE Listener to the mainCamera and set the ReverbPreset to a value other than None. ・in project settings > Audio, set Spatializer plugin to PHASE Spatializer. ・from there, build for visionOS.
0
0
702
3w
RenderBox Framework Warning
Unable to open mach-O at path: /AppleInternal/Library/BuildRoots/4~B5FIugA1pgyNPFl0-ZGG8fewoBL0-6a_xWhpzsk/Library/Caches/com.apple.xbs/Binaries/RenderBox/install/TempContent/Root/System/Library/PrivateFrameworks/RenderBox.framework/Versions/A/Resources/default.metallib Error:2 This happens only on macOS Sequoia - not on macOS Tahoe. I have got a noticeable amount of lag in the animations of my App where this Warning arises I've tried to isolate the respective animations from the main thread too - still getting the same issue with the lag Is it possible to resolve it, as I want backwards compatibility with my app for the users
0
0
50
3w
Can MPSGraphExecutable automatically leverage Apple Neural Engine (ANE) for inference?
Hi, I'm currently using Metal Performance Shaders Graph (MPSGraphExecutable) to run neural network inference operations as part of a metal rendering pipeline. I also tried to profile the usage of neural engine when running inference using MPSGraphExecutable but the graph shows no sign of neural engine usage. However, when I used the coreML model inspection tool in xcode and run performance report, it was able to use ANE. Does MPSGraphExecutable automatically utilize the Apple Neural Engine (ANE) when running inference operations, or does it only execute on GPU? My model (Core ML Package) was converted from a pytouch model using coremltools with ML program type and support iOS17.0+. Any insights or documentation references would be greatly appreciated!
0
0
352
3w