Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Created

Xcode Vulkan is opening two windows instead of one.
I'm a newbee at Vulkan and Xcode. I have my project on github https://github.com/flocela/OrangeSpider/ Whenever I run, two windows open instead of only one. I added testing, which means I have an OrangeSpider.xctestplan in the OrangeSpider/TestsOrangeSpider/ folder. This is my first time adding testing to an XCode project, so I think this may be where the problem is. I also get this error message: ViewBridge to RemoteViewService Terminated: Error Domain=com.apple.ViewBridge Code=18 "(null)" UserInfo={com.apple.ViewBridge.error.hint=this process disconnected remote view controller -- benign unless unexpected, com.apple.ViewBridge.error.description=NSViewBridgeErrorCanceled}
4
0
193
Jun ’25
WWDC25 Metal & game technologies group lab
Hello, Thank you for attending today’s Metal & game technologies group lab at WWDC25! We were delighted to answer many questions from developers and energized by the community engagement. We hope you enjoyed it and welcome your feedback. We invite you to carry on the conversation here, particularly if your question appeared in Slido and we were unable to answer it during the lab. If your question received feedback let us know if you need clarification. You may want to ask your question again in a different lab e.g. visionOS tomorrow. (We realize that this can be confusing when frameworks interoperate) We have a lot to learn from each other so let’s get to Q&A and make the best of WWDC25! 😃 Looking forward to your questions posted in new threads.
2
0
493
Jun ’25
Unable to compile Core Image filter on Xcode 26 due to missing Metal toolchain
I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode. If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice? Xcode Version Version 26.0 beta (17A5241e) Build Output Showing All Errors Only Build target Lessons of project StudyJapanese with configuration Light RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese') cd /Users/chris/Code/SerpentiSei/StudyJapanese /bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"' ' error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain /Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain Build failed 6/9/25, 8:31 PM 27.1 seconds Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select) xcodebuild -downloadComponent MetalToolchain Beginning asset download... Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg Done downloading: Metal Toolchain (17A5241c). Screenshots from Xcode Result of "Copy Information" Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)
25
0
3.7k
Jun ’25
Sparse Texture Writes
Hey, I've been struggling with this for some days now. I am trying to write to a sparse texture in a compute shader. I'm performing the following steps: Set up a sparse heap and create a texture from it Map the whole area of the sparse texture using updateTextureMapping(..) Overwrite every value with the value "4" in a compute shader Blit the texture to a shared buffer Assert that the values in the buffer are "4". I have a minimal example (which is still pretty long unfortunately). It works perfectly when removing the line heapDesc.type = .sparse. What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated. import Metal func sparseTexture64x64Demo() throws { // ── Metal objects guard let device = MTLCreateSystemDefaultDevice() else { throw NSError(domain: "SparseNotSupported", code: -1) } let queue = device.makeCommandQueue()! let lib = device.makeDefaultLibrary()! let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!) // ── Texture descriptor let width = 64, height = 64 let format: MTLPixelFormat = .r32Uint // 4 B per texel let desc = MTLTextureDescriptor() desc.textureType = .type2D desc.pixelFormat = format desc.width = width desc.height = height desc.storageMode = .private desc.usage = [.shaderWrite, .shaderRead] // ── Sparse heap let bytesPerTile = device.sparseTileSizeInBytes let meta = device.heapTextureSizeAndAlign(descriptor: desc) let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile let heapDesc = MTLHeapDescriptor() heapDesc.type = .sparse heapDesc.storageMode = .private heapDesc.size = heapBytes let heap = device.makeHeap(descriptor: heapDesc)! let tex = heap.makeTexture(descriptor: desc)! // ── CPU buffers let bytesPerPixel = MemoryLayout<UInt32>.stride let rowStride = width * bytesPerPixel let totalBytes = rowStride * height let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)! let cb = queue.makeCommandBuffer()! let fence = device.makeFence()! // 2. Map the sparse tile, then signal the fence let rse = cb.makeResourceStateCommandEncoder()! rse.updateTextureMapping( tex, mode: .map, region: MTLRegionMake2D(0, 0, width, height), mipLevel: 0, slice: 0) rse.update(fence) // ← capture all work so far rse.endEncoding() let ce = cb.makeComputeCommandEncoder()! ce.waitForFence(fence) ce.setComputePipelineState(pipeline) ce.setTexture(tex, index: 0) let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1) let tgCount = MTLSize(width: (width + 7) / 8, height: (height + 7) / 8, depth: 1) ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG) ce.updateFence(fence) ce.endEncoding() // Blit texture into shared buffer let blit = cb.makeBlitCommandEncoder()! blit.waitForFence(fence) blit.copy( from: tex, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), sourceSize: MTLSize(width: width, height: height, depth: 1), to: dstBuf, destinationOffset: 0, destinationBytesPerRow: rowStride, destinationBytesPerImage: totalBytes) blit.endEncoding() cb.commit() cb.waitUntilCompleted() assert(cb.error == nil, "GPU error: \(String(describing: cb.error))") // ── Verify a few texels let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height) print("first three texels:", out[0], out[1], out[width]) // 0 1 64 assert(out[0] == 4 && out[1] == 4 && out[width] == 4) } Metal shader: #include <metal_stdlib> using namespace metal; kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]], uint2 gid [[thread_position_in_grid]]) { tex.write(4, gid); }
1
0
152
May ’25
vsync, drawable present, instrument gui
hi When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
0
0
181
May ’25
iOS Metal system delayed one Vsync period to really display the frame on the screen
View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
3
0
616
Apr ’25
How to use MTKTextureLoader to load png data
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area. Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong. UIImage *tempImg = [UIImage imageWithData:data]; CGImageRef cgRef = tempImg.CGImage; MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device]; id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; NSData *tempData = UIImagePNGRepresentation(tempImg); id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; }] resume];
5
0
708
Dec ’24
OS choosing performance state poorly for GPU use case
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU. However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time. I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away. For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well. I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds! I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state. I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work. Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up? Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
6
0
1k
Nov ’24
Metal runtime shader library compilation and linking issue
In my project I need to do the following: In runtime create metal Dynamic library from source. In runtime create metal Executable library from source and Link it with my previous created Dynamic library. Create compute pipeline using those two libraries created above. But I get the following error at the third step: Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel " UserInfo={NSLocalizedDescription=Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel } import Foundation import Metal class MetalShaderCompiler { let device = MTLCreateSystemDefaultDevice()! var pipeline: MTLComputePipelineState! func compileDylib() -> MTLDynamicLibrary { let source = """ #include <metal_stdlib> using namespace metal; half3 noise() { return half3(1, 0, 1); } """ let option = MTLCompileOptions() option.libraryType = .dynamic option.installName = "@executable_path/libFoundation.metallib" let library = try! device.makeLibrary(source: source, options: option) let dylib = try! device.makeDynamicLibrary(library: library) return dylib } func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary { let source = """ #include <metal_stdlib> using namespace metal; extern half3 noise(); kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]], texture2d<half, access::write> dst [[texture(1)]], ushort2 gid [[thread_position_in_grid]]) { half4 rgba = src.read(gid); rgba.rgb += noise(); dst.write(rgba, gid); } """ let option = MTLCompileOptions() option.libraryType = .executable option.libraries = [dylib] let library = try! self.device.makeLibrary(source: source, options: option) return library } func runtime() { let dylib = self.compileDylib() let exlib = self.compileExlib(dylib: dylib) let pipelineDescriptor = MTLComputePipelineDescriptor() pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel") pipelineDescriptor.preloadedLibraries = [dylib] pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil) } }
5
0
1.1k
Sep ’24
Xcode Vulkan is opening two windows instead of one.
I'm a newbee at Vulkan and Xcode. I have my project on github https://github.com/flocela/OrangeSpider/ Whenever I run, two windows open instead of only one. I added testing, which means I have an OrangeSpider.xctestplan in the OrangeSpider/TestsOrangeSpider/ folder. This is my first time adding testing to an XCode project, so I think this may be where the problem is. I also get this error message: ViewBridge to RemoteViewService Terminated: Error Domain=com.apple.ViewBridge Code=18 "(null)" UserInfo={com.apple.ViewBridge.error.hint=this process disconnected remote view controller -- benign unless unexpected, com.apple.ViewBridge.error.description=NSViewBridgeErrorCanceled}
Replies
4
Boosts
0
Views
193
Activity
Jun ’25
WWDC25 Metal & game technologies group lab
Hello, Thank you for attending today’s Metal & game technologies group lab at WWDC25! We were delighted to answer many questions from developers and energized by the community engagement. We hope you enjoyed it and welcome your feedback. We invite you to carry on the conversation here, particularly if your question appeared in Slido and we were unable to answer it during the lab. If your question received feedback let us know if you need clarification. You may want to ask your question again in a different lab e.g. visionOS tomorrow. (We realize that this can be confusing when frameworks interoperate) We have a lot to learn from each other so let’s get to Q&A and make the best of WWDC25! 😃 Looking forward to your questions posted in new threads.
Replies
2
Boosts
0
Views
493
Activity
Jun ’25
Unable to compile Core Image filter on Xcode 26 due to missing Metal toolchain
I have a Core Image filter in my app that uses Metal. I cannot compile it because it complains that the executable tool metal is not available, but I have installed it in Xcode. If I go to the "Components" section of Xcode Settings, it shows it as downloaded. And if I run the suggested command, it also shows it as installed. Any advice? Xcode Version Version 26.0 beta (17A5241e) Build Output Showing All Errors Only Build target Lessons of project StudyJapanese with configuration Light RuleScriptExecution /Users/chris/Library/Developer/Xcode/DerivedData/StudyJapanese-glbneyedpsgxhscqueifpekwaofk/Build/Intermediates.noindex/StudyJapanese.build/Light-iphonesimulator/Lessons.build/DerivedSources/OtsuThresholdKernel.ci.air /Users/chris/Code/SerpentiSei/Shared/iOS/CoreImage/OtsuThresholdKernel.ci.metal normal undefined_arch (in target 'Lessons' from project 'StudyJapanese') cd /Users/chris/Code/SerpentiSei/StudyJapanese /bin/sh -c xcrun\ metal\ -w\ -c\ -fcikernel\ \"\$\{INPUT_FILE_PATH\}\"\ -o\ \"\$\{SCRIPT_OUTPUT_FILE_0\}\"' ' error: error: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain /Users/chris/Code/SerpentiSei/StudyJapanese/error:1:1: cannot execute tool 'metal' due to missing Metal Toolchain; use: xcodebuild -downloadComponent MetalToolchain Build failed 6/9/25, 8:31 PM 27.1 seconds Result of xcodebuild -downloadComponent MetalToolchain (after switching Xcode-beta.app with xcode-select) xcodebuild -downloadComponent MetalToolchain Beginning asset download... Downloaded asset to: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/4d77809b60771042e514cfcf39662c6d1c195f7d.asset/AssetData/Restore/022-19457-035.dmg Done downloading: Metal Toolchain (17A5241c). Screenshots from Xcode Result of "Copy Information" Metal Toolchain 26.0 [com.apple.MobileAsset.MetalToolchain: 17.0 (17A5241c)] (Installed)
Replies
25
Boosts
0
Views
3.7k
Activity
Jun ’25
Sparse Texture Writes
Hey, I've been struggling with this for some days now. I am trying to write to a sparse texture in a compute shader. I'm performing the following steps: Set up a sparse heap and create a texture from it Map the whole area of the sparse texture using updateTextureMapping(..) Overwrite every value with the value "4" in a compute shader Blit the texture to a shared buffer Assert that the values in the buffer are "4". I have a minimal example (which is still pretty long unfortunately). It works perfectly when removing the line heapDesc.type = .sparse. What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated. import Metal func sparseTexture64x64Demo() throws { // ── Metal objects guard let device = MTLCreateSystemDefaultDevice() else { throw NSError(domain: "SparseNotSupported", code: -1) } let queue = device.makeCommandQueue()! let lib = device.makeDefaultLibrary()! let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!) // ── Texture descriptor let width = 64, height = 64 let format: MTLPixelFormat = .r32Uint // 4 B per texel let desc = MTLTextureDescriptor() desc.textureType = .type2D desc.pixelFormat = format desc.width = width desc.height = height desc.storageMode = .private desc.usage = [.shaderWrite, .shaderRead] // ── Sparse heap let bytesPerTile = device.sparseTileSizeInBytes let meta = device.heapTextureSizeAndAlign(descriptor: desc) let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile let heapDesc = MTLHeapDescriptor() heapDesc.type = .sparse heapDesc.storageMode = .private heapDesc.size = heapBytes let heap = device.makeHeap(descriptor: heapDesc)! let tex = heap.makeTexture(descriptor: desc)! // ── CPU buffers let bytesPerPixel = MemoryLayout<UInt32>.stride let rowStride = width * bytesPerPixel let totalBytes = rowStride * height let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)! let cb = queue.makeCommandBuffer()! let fence = device.makeFence()! // 2. Map the sparse tile, then signal the fence let rse = cb.makeResourceStateCommandEncoder()! rse.updateTextureMapping( tex, mode: .map, region: MTLRegionMake2D(0, 0, width, height), mipLevel: 0, slice: 0) rse.update(fence) // ← capture all work so far rse.endEncoding() let ce = cb.makeComputeCommandEncoder()! ce.waitForFence(fence) ce.setComputePipelineState(pipeline) ce.setTexture(tex, index: 0) let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1) let tgCount = MTLSize(width: (width + 7) / 8, height: (height + 7) / 8, depth: 1) ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG) ce.updateFence(fence) ce.endEncoding() // Blit texture into shared buffer let blit = cb.makeBlitCommandEncoder()! blit.waitForFence(fence) blit.copy( from: tex, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), sourceSize: MTLSize(width: width, height: height, depth: 1), to: dstBuf, destinationOffset: 0, destinationBytesPerRow: rowStride, destinationBytesPerImage: totalBytes) blit.endEncoding() cb.commit() cb.waitUntilCompleted() assert(cb.error == nil, "GPU error: \(String(describing: cb.error))") // ── Verify a few texels let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height) print("first three texels:", out[0], out[1], out[width]) // 0 1 64 assert(out[0] == 4 && out[1] == 4 && out[width] == 4) } Metal shader: #include <metal_stdlib> using namespace metal; kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]], uint2 gid [[thread_position_in_grid]]) { tex.write(4, gid); }
Replies
1
Boosts
0
Views
152
Activity
May ’25
vsync, drawable present, instrument gui
hi When analyzing our game using Instruments, I've always been confused about the two items "Drawable Present" and "Drawable Presented" in the GPU column. The timing of Drawable Present seems to be when the CPU layer calls commandbuffer:present, rather than when the actual encoding is completed on the GPU. Also, what does drawable presented specifically mean? In our case, when a CPU stall occurs, it appears that the vsync interval changes in the next frame, and a surface that has already been calculated is not displayed. Why is this happening?
Replies
0
Boosts
0
Views
181
Activity
May ’25
iOS Metal system delayed one Vsync period to really display the frame on the screen
View Layout Add the following views in a view controller: Label View A, with a subview of the same size: MTKView A View B, with a subview of the same size: MTKView B Refresh Rates of Each View The label view refreshes at 60fps (driven by CADisplayLink). MTKView A and B refresh at 15fps. MTKView Implementation Details The corresponding CAMetalLayer's maximumDrawableCount is set to 2, changed to double buffering. The scheduling mechanism is modified; drawing is not driven by the internal loop but is done manually. The draw call is triggered immediately upon receiving a frame. self.metalView.enableSetNeedsDisplay = NO; self.metalView.paused = YES; A new high-priority queue is created for drawing, instead of handling it on the main queue. MTKView Latency Tracking The GPU completion time T1 is observed through the addCompletedHandler callback of the CommandBuffer. The presentation time T2 of the frame is observed through the addPresentedHandler callback of the currentDrawable in MTKView. Testing shows that T2 - T1 > 16.6ms (the Vsync period at 60Hz). This means that after the GPU rendering in MTLView is finished, the frame is not actually displayed at the next Vsync instruction but only at the Vsync instruction after that. I believe there is an extra 16.6ms of latency here, which I want to eliminate by adjusting the rendering mechanism. Observation from Instruments From Instruments, the Surface presentation aligns with the above test results. After the Metal encoder finishes, the Surface in Display switches only after the next-next Vsync instruction. See the image in the link for details. Questions According to a beginner's understanding, after MTKView's GPU rendering is finished, the next Vsync instruction should officially display (make it visible). However, this is not what is observed. Does the subview MTKView need to wait for another Vsync cycle to be drawn to the actual display buffer? The label updates its text at 60fps, so the entire interface should be displayed at 60fps. Is the content of MTKView not synchronized when the display happens? Explanation of the Reasoning Behind Some MTKView Code Details Changing from the default triple buffering to double buffering helps reduce the latency introduced by rendering. Not using MTKView's own scheduling mechanism but using manual triggering of the draw method is because MTKView's own scheduling mechanism is driven by CADisplayLink. Therefore, if a frame falls within a Vsync window, it needs to wait for the next Vsync window to trigger the draw operation, which introduces waiting latency.
Replies
3
Boosts
0
Views
616
Activity
Apr ’25
How to use MTKTextureLoader to load png data
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area. Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong. UIImage *tempImg = [UIImage imageWithData:data]; CGImageRef cgRef = tempImg.CGImage; MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device]; id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; NSData *tempData = UIImagePNGRepresentation(tempImg); id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil]; }] resume];
Replies
5
Boosts
0
Views
708
Activity
Dec ’24
OS choosing performance state poorly for GPU use case
I am building a MacOS desktop app (https://anukari.com) that is using Metal compute to do real-time audio/DSP processing, as I have a problem that is highly parallelizable and too computationally expensive for the CPU. However it seems that the way in which I am using the GPU, even when my app is fully compute-limited, the OS never increases the power/performance state. Because this is a real-time audio synthesis application, it's a huge problem to not be able to take advantage of the full clock speeds that the GPU is capable of, because the app can't keep up with real-time. I discovered this issue while profiling the app using Instrument's Metal tracing (and Game tracing) modes. In the profiling configuration under "Metal Application" there is a drop-down to select the "Performance State." If I run the application under Instruments with Performance State set to Maximum, it runs amazingly well, and all my problems go away. For comparison, when I run the app on its own, outside of Instruments, the expensive GPU computation it's doing takes around 2x as long to complete, meaning that the app performs half as well. I've done a ton of work to micro-optimize my Metal compute code, based on every scrap of information from the WWDC videos, etc. A problem I'm running into is that I think that the more efficient I make my code, the less it signals to the OS that I want high GPU clock speeds! I think part of why the OS is confused is that in most use cases, my computation can be done using only a small number of Metal threadgroups. I'm guessing that the OS heuristics see that only a small fraction of the GPU is saturated and fail to scale up the power/clock state. I'm not sure what to do here; I'm in a bit of a bind. One possibility is that I intentionally schedule busy work -- spin threadgroups just to waste energy and signal to the OS that I need higher clock speeds. This is obviously a really bad idea, but it might work. Is there any other (better) way for my app to signal to the OS that it is doing real-time latency-sensitive computation on the GPU and needs the clock speeds to be scaled up? Note that game mode is not really an option, as my app also runs as an AU plugin inside hosts like Garageband, so it can't be made fullscreen, etc.
Replies
6
Boosts
0
Views
1k
Activity
Nov ’24
Metal runtime shader library compilation and linking issue
In my project I need to do the following: In runtime create metal Dynamic library from source. In runtime create metal Executable library from source and Link it with my previous created Dynamic library. Create compute pipeline using those two libraries created above. But I get the following error at the third step: Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel " UserInfo={NSLocalizedDescription=Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel } import Foundation import Metal class MetalShaderCompiler { let device = MTLCreateSystemDefaultDevice()! var pipeline: MTLComputePipelineState! func compileDylib() -> MTLDynamicLibrary { let source = """ #include <metal_stdlib> using namespace metal; half3 noise() { return half3(1, 0, 1); } """ let option = MTLCompileOptions() option.libraryType = .dynamic option.installName = "@executable_path/libFoundation.metallib" let library = try! device.makeLibrary(source: source, options: option) let dylib = try! device.makeDynamicLibrary(library: library) return dylib } func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary { let source = """ #include <metal_stdlib> using namespace metal; extern half3 noise(); kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]], texture2d<half, access::write> dst [[texture(1)]], ushort2 gid [[thread_position_in_grid]]) { half4 rgba = src.read(gid); rgba.rgb += noise(); dst.write(rgba, gid); } """ let option = MTLCompileOptions() option.libraryType = .executable option.libraries = [dylib] let library = try! self.device.makeLibrary(source: source, options: option) return library } func runtime() { let dylib = self.compileDylib() let exlib = self.compileExlib(dylib: dylib) let pipelineDescriptor = MTLComputePipelineDescriptor() pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel") pipelineDescriptor.preloadedLibraries = [dylib] pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil) } }
Replies
5
Boosts
0
Views
1.1k
Activity
Sep ’24