I mean…I want to use defaults rather than launching apps via open with the saved environment variables.
This is pretty easy on iOS and other platforms. So what about in macOS?
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi everyone,
This project uses PyTorch on an Apple Silicon Mac (M1/M2/etc.), and the goal is to use the MPS backend for GPU acceleration, notes Apple Developer. However, the workflow depends on Float64 (double-precision) floating-point numbers for certain computations, notes PyTorch Forums.
The error "Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead" has been encountered, notes GitHub. It seems that the MPS backend doesn't currently support Float64 for direct GPU computation.
Questions for the community:
Are there any known workarounds or best practices for handling Float64-dependent operations when using the MPS backend with PyTorch?
For those working with high-precision tasks on Apple Silicon, what strategies are being used to balance performance with the need for Float64?
Offloading to the CPU is an option, and it's of interest to know if there are any specific techniques or libraries within the Apple ecosystem that could streamline this process while aiming for optimal performance.
Any insights, tips, or experiences would be appreciated.
Thanks in advance,
Jonaid
MacBook Pro M3 Max
Topic:
Graphics & Games
SubTopic:
Metal
Hello,
I am experiencing an issue with programmatically capturing a GPU trace using MTLCaptureManager. The .gputrace file that is generated appears to be corrupted, and I'm looking for guidance or a solution.
Description of the Problem:
I am using MTLCaptureManager.sharedCaptureManager to capture a Metal frame and save it to disk.
The generated .gputrace file is consistently reported as 0 bytes in size by the file system.
Crucially, when I compress this 0-byte .gputrace file into a .zip archive, the resulting archive contains the full, expected data. After unzipping, the file can be opened and viewed correctly in Xcode.
However,When inspecting the file's contents using NSFileManager in Objective-C (treating it as a directory), the internal structure is different from a .gputrace file captured directly from Xcode's Metal Debugger.
capture in xcode
capture in file
Finally,When capturing multiple frames programmatically, the first captured frame contains valid buffer data. However, for subsequent frames (starting from the second frame), the corresponding buffer contents are all zero-filled.
Frame 1: All MTLBuffer data is correctly captured and populated.
Frame 2 and onward: The same MTLBuffer objects are present in the trace, but their contents are entirely 0 (i.e., the data is not captured or is corrupted).
In this case, the on-screen display is normal, but the captured frame is incorrect. The frame captured directly in Xcode is also correct. Only the frame captured to a file is abnormal.
Topic:
Graphics & Games
SubTopic:
Metal
If I compile a compute kernel with a call to texture.read(), it fails with the following error: "Error Domain=AGXMetalG13X Code=3 "Encountered unlowered function call to air.get_read_sampler" UserInfo={NSLocalizedDescription=Encountered unlowered function call to air.get_read_sampler}."
This error occurs on both macOS and iOS 26 Beta 5, but not when running on a simulator or in a playground. It does not occur on a macOS Sequoia VM. It occurs whether I use the old metal 3 or new metal 4 compilation method.
A workaround would be to use a sampler, but according to the feature tables, all platforms support reading from textures of all formats.
Below is a minimal example which produces the error:
let device = MTLCreateSystemDefaultDevice()!
let library = device.makeDefaultLibrary()!
let computeFunction = library.makeFunction(name: "compute_test")!
do {
let pipeline = try device.makeComputePipelineState(function: computeFunction)
debugPrint(pipeline)
} catch {
debugPrint("Metal 3 failed with error:\n\(error)")
}
#import <metal_stdlib>
using namespace metal;
kernel void compute_test(uint2 gid [[thread_position_in_grid]],
texture2d<float, access::read> in [[texture(0)]],
texture2d<float, access::write> out [[texture(1)]]) {
out.write(in.read(gid), gid);
}
I filed feedback FB19530049.
I am developing a macOS terminal app, running on an M4 Pro, and using Metal.
I am not able use float8 or float16, both reporting Variable has incomplete type 'float16' (aka '__Reserved_Name__Do_not_use_float16').
Based on the system I should be able to use these. Either it is because it is also compiling to Intel, which they are not allowed, or something else. Either way I have not been able to figure out how to get past this.
IIs there a compiler setting I need to set to make this work? if so which one and what setting do I need? I only want to run this on M processes, on the latest version of OS so not interested in Intel version or backward compatibility.
Hi there,
Is it possible to customize the Metal Performance HUD on Apple TV, similar to how it can be done on iPhone & iPad?
Would like to see things like Compiled Shaders for my Apps on tvOS
.
Our APP has integrated 3D function, in order to reduce the memory occupation of the APP in the background, we will uninstall the 3D after the APP enters the background. However, the uninstall also causes problems. When the uninstall process is executed in the background, the app will briefly trigger the background GPU rendering error warning with the error warning code:
OGPUMetalError: Insufficient Permission (to submit GPU work from background) (00000006:kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted)
Execution of the command buffer was aborted due to an error during execution. Insufficient Permission (to submit GPU Work from background) (00000006: kIOGPUCommandBufferCallbackErrorBackgroundExecutionNotPermitted) excuse me this warning system will tighten APP permissions background? For example, limit or shorten the background survival time of the APP. In addition, will the background refresh function fail, resulting in the failure of Bluetooth Ibeacon activation?
Topic:
Graphics & Games
SubTopic:
Metal
I have really enjoyed looking through the code and videos related to Metal 4. Currently, my interest is to update a ReSTIR Project and take advantage of more robust ways to refit acceleration Structures and more powerful ways to access resources.
I am working in Swift and have encountered a couple of puzzles:
What is the 'accepted' way to create a MTL4BufferRange to store indices and vertices?
How do I properly rewrite Swift code to build and compact an Acceleration Structure?
I do realize that this is all in Beta and will happily look through Code Samples this Fall. If other guidance is available earlier, that would be fabulous!
Thank you
Hello!
I'm developing a GPU (shader) language, where I aim to target multiple backends with a common frontend. I wanted to avoid having to round trip through Metal, and go straight to IR just like I have with SPIRV, in order to have a fast and efficient compilation process.
I've been looking for a reference page where I can read about Metals IR, and as far as I'm aware, it exists, but I can't seem to find it anywhere.
Furthermore, if such a reference is available, is there also a toolkit where I can run validation on the output IR, and perhaps even run optimizations, much like spv-tools for SPIRV?
Any help would be appreciated!
Thanks,
Gustav
Hi, developers,
I maintain a shipped app that uses string concatenation to construct Metal shader and compile on-device. Beta 4 seems disabled __asm keyword, resulting the compilation failure.
The error is:
v2/GEMMKernel.cpp:229: error: program_source:23:9: error: illegal string literal in 'asm'
__asm("air.simdgroup_async_copy_1d.p3i8.p1i8");
The relevant code is available at https://github.com/liuliu/ccv/blob/unstable/lib/nnc/mfa/v2/GEMMHeaders.cpp#L30 although any __asm will trip this.
Please give us guidance on whether this is a regression or this will be something enforced in 26 release. Personally, I would consider this as a bug given it won't impact anything "compiled" shaders.
Thanks for your patience reading this!
I would love to use Background GPU Access to do some video processing in the background.
However the documentation of BGContinuedProcessingTaskRequest.Resources.gpu clearly states:
Not all devices support background GPU use. For more information, see Performing long-running tasks on iOS and iPadOS.
Is there a list available of currently released devices that do (or don't) support GPU background usage? That would help to understand what part of our user base can use this feature. (And what hardware we need to test this on as developers.)
For example it seems that it isn't supported on an iPad Pro M1 with the current iOS 26 beta. The simulators also seem to not support the background GPU resource. So would be great to understand what hardware is capable of using this feature!
Hello everyone,
I must have missed something but why isn't there a depthAttachmentPixelFormat to the new Metal 4 MTL4RenderPipelineDescriptor, unlike the old MTLRenderPipelineDescriptor?
So how do you set the depth pixel format?
Thanks in advance!
Hello, I'm tracking down a bug where useResource doesn't seem to apply proper synchronization when a resource is produced by the render pass then consumed by the compute pass, but when I use MTLFence between the to signal and wait between the render/compute encoders, the artifact goes away.
The resource is created with MTLHazardTrackingModeTracked and useResource is called on the compute encoder after the render pass. Metal API Validation doesn't report any warnings/errors.
Am I misunderstanding the difference between the two APIs? I dug through the Metal documentation and it looks like useResource should handle synchronization given the resource has MTLHazardTrackingModeTracked but on the other hand, MTLFence should be used to ensure proper synchronization between command encoders. Can someone can clarify the difference between the two APIs and when to use them.
I am using Unreal Engine 5.6 on a MacBook Pro with an M3 chip and macOS 15.5. I’ve installed Xcode and accepted the license, but Unreal is not detecting the latest Metal Shader Standard (Metal v3.0). The maximum version Unreal sees is Metal v2.4, even though the hardware and OS should support Metal 3.0. I’ve also run sudo xcode-select -s /Applications/Xcode.app and accepted the license via Terminal. Is there anything in Xcode settings, SDK availability, or system permissions that could be preventing access to Metal 3.0 features?"
Hi ,
My application meet below crash backtrace at very low repro rate from the public users, i do not see it relate to a specific iOS version or iPhone model. The last code line from my application is calling CAMetalLayer nextDrawable API.
I did some basic studying, suppose it may relate to the wrong CAMetaLayer configuration, like
frame property w or h <= 0.0
bounds property w or h <= 0.0
drawableSize w or h <= 0.0 or w or h > max value (like 16384)
Not sure my above thinking is right or not? Will the UIView which my CAMetaLayer attached will cause such nextDrawable crash or not ?
Thanks a lot
Main Thread - Crashed
libsystem_kernel.dylib
__pthread_kill
libsystem_c.dylib
abort
libsystem_c.dylib
__assert_rtn
Metal
MTLReportFailure.cold.1
Metal
MTLReportFailure
Metal
_MTLMessageContextEnd
Metal
-[MTLTextureDescriptorInternal validateWithDevice:]
AGXMetalA13
0x245b1a000 + 4522096
QuartzCore
allocate_drawable_texture(id<MTLDevice>, __IOSurface*, unsigned int, unsigned int, MTLPixelFormat, unsigned long long, CAMetalLayerRotation, bool, NSString*, unsigned long)
QuartzCore
get_unused_drawable(_CAMetalLayerPrivate*, CAMetalLayerRotation, bool, bool)
QuartzCore
CAMetalLayerPrivateNextDrawableLocked(CAMetalLayer*, CAMetalDrawable**, unsigned long*)
QuartzCore
-[CAMetalLayer nextDrawable]
SpaceApp
-[MetalRender renderFrame:] MetalRenderer.mm:167
SpaceApp
-[FrameBuffer acceptFrame:] VideoRender.mm:173
QuartzCore
CA::Display::DisplayLinkItem::dispatch_(CA::SignPost::Interval<(CA::SignPost::CAEventCode)835322056>&)
QuartzCore
CA::Display::DisplayLink::dispatch_items(unsigned long long, unsigned long long, unsigned long long)
QuartzCore
CA::Display::DisplayLink::dispatch_deferred_display_links(unsigned int)
UIKitCore
_UIUpdateSequenceRun
UIKitCore
schedulerStepScheduledMainSection
UIKitCore
runloopSourceCallback
CoreFoundation
__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__
CoreFoundation
__CFRunLoopDoSource0
CoreFoundation
__CFRunLoopDoSources0
CoreFoundation
__CFRunLoopRun
CoreFoundation
CFRunLoopRunSpecific
GraphicsServices
GSEventRunModal
UIKitCore
-[UIApplication _run]
UIKitCore
UIApplicationMain
When I take a frame capture of my application in Xcode, it shows a warning that reads "Your application created separate command encoders which can be combined into a single encoder. By combining these encoders you may reduce your application's load/store bandwidth usage."
In the minimal reproduction case I've identified for this warning, I have two render pipeline states: The first writes to the current drawable, the depth buffer, and a secondary color buffer. The second writes only to the current drawable.
Because these are writing to a different set of outputs, I was initially creating two separate render command encoders to handle the draws under each of these states.
My understanding is that Xcode is telling me I could only create one, however when I try to do that, I get runtime asserts when attempting to apply the second render pipeline state since it doesn't have a matching attachment configured for the second color buffer or for the depth buffer, so I can't just combine the encoders.
Is the only solution here to detect and propagate forward the color/depth attachments from the first state into the creation of the second state?
Is there any way to suppress this specific warning in Xcode?
Topic:
Graphics & Games
SubTopic:
Metal
The sample code here, has code like:
// Create a display link capable of being used with all active displays
cvReturn = CVDisplayLinkCreateWithActiveCGDisplays(&_displayLink);
But that function's doc says it's deprecated and to use NSView/NSWindow/NSScreen displayLink instead. That returns CADisplayLink, not CVDisplayLink.
Also the documentation for that displayLink method is completely empty. I'm not sure if I'm supposed to add it to run loop, or what, after I get it.
It would be nice to get an updated version of this sample project and/or have some documentation in NSView.displayLink
Topic:
Graphics & Games
SubTopic:
Metal
Im new in the Mac area but for sure not UE. Windows is a long process to packaging but it could be done. All the documentation for Epic and from the internet is basically non existent with exactly how to package a project within UE. I have Xcode installed which makes sense, agreed to terms and install for MacOS, I've been able to make a project for several weeks now and want to package for a test run for my friends to play on Windows. Now I just get this in the log:
UATHelper: Packaging (Mac): ERROR: Failed to finalize the .app with Xcode. Check the log for more information
UATHelper: Packaging (Mac): Trace written to file /Users/rileysleger/Library/Logs/Unreal Engine/LocalBuildLogs/UBA-ProjectNightTerror-Mac-Development.uba with size 12.6kb
UATHelper: Packaging (Mac): Total time in Unreal Build Accelerator local executor: 8.12 seconds
UATHelper: Packaging (Mac): Result: Failed (OtherCompilationError)
UATHelper: Packaging (Mac): Total execution time: 9.71 seconds
PackagingResults: Error: Failed to finalize the .app with Xcode. Check the log for more information
UATHelper: Packaging (Mac): Took 9.77s to run dotnet, ExitCode=6
UATHelper: Packaging (Mac): UnrealBuildTool failed. See log for more details. (/Users/rileysleger/Library/Logs/Unreal Engine/LocalBuildLogs/UBA-ProjectNightTerror-Mac-Development.txt)
UATHelper: Packaging (Mac): AutomationTool executed for 0h 0m 10s
UATHelper: Packaging (Mac): AutomationTool exiting with ExitCode=6 (6)
UATHelper: Packaging (Mac): RunUAT ERROR: AutomationTool was unable to run successfully. Exited with code: 6
PackagingResults: Error: AutomationTool was unable to run successfully. Exited with code: 6
PackagingResults: Error: Unknown Error
This absolutely makes no sense to me. Anyone have ideas?
Description:
In the official visionOS 26 Hover Effect sample code project , I encountered an issue where the event.trackingAreaIdentifier returned by onSpatialEvent does not reset as expected.
Steps to Reproduce:
Select an object with trackingAreaID = 6 in the sample app.
Look at a blank space (outside any tracking area) and perform a pinch gesture .
Expected Behavior:
The event.trackingAreaIdentifier should return 0 when interacting with a non-tracking area.
Actual Behavior:
The event.trackingAreaIdentifier still returns 6, even after restarting the app or killing the process. This persists regardless of where the pinch gesture is performed
Hi,
What's the best way to handle drastic changes in scene charateristics with the new MTLFXTemporalDenoisedScaler?
Let's say a visible object of the scene radically changes its material properties. I can modify the albedo and roughness textures consequently. But I suspect the history will be corrupted. Blending visual information between the new frame and the previous ones might be a nonsense.
I guess the problem should be the same when objects appear or disappear instantly.
Is the upsacler manage these events for us (by lowering blending), or should we use the reactive or the denoise strength mask or something like that to handle them?