Post

Replies

Boosts

Views

Activity

Reply to Unexpected behavior for shared MTLBuffer during CPU work
extension MTLCommandBuffer {     func encodeCPUExecution(for sharedEvent: MTLSharedEvent, listener: MTLSharedEventListener, work: @escaping () -> Void) {         let value = sharedEvent.signaledValue         sharedEvent.notify(listener, atValue: value + 1) { event, _ in             work()             event.signaledValue = value + 2         }         encodeSignalEvent(sharedEvent, value: value + 1)         encodeWaitForEvent(sharedEvent, value: value + 2)     } } This is the code for encodeCPUExecution my mistake for not making it clear enough. In fact the GPU does wait on value + 2 as you described, yet the behavior still exists. The issue is that the computation is quite suited for CPU execution (it can actually take advantage of dynamic programming for O(n) time) and is not suited for GPU execution, though I suppose you could have a single thread write the result out in a similar way the CPU does (which is probably more performant even) I would still like to figure out why this behavior exists in the first place, even if the computation is pushed to a single thread on the GPU
Topic: Programming Languages SubTopic: Swift Tags:
Feb ’22
Reply to Unexpected behavior for shared MTLBuffer during CPU work
Currently, since this project is a work-in-progress, only a single execution of the image pipeline executes. During the execution, theMTLCaptureManager captures the execution of the command buffer. There is no loop: it executes exactly once, and its execution is analyzed. Within the execution of the image processing pipeline, this is the only spot where the GPU-CPU synchronization occurs with the shared event. The shared event resource, as well as the other resources in the pipeline, are created before the creation of the command buffer. The resources used in the pipeline are all tracked by Metal (hazardTrackingMode = .tracked) (though I hope to change this in the future and use heaps for more efficiency) Here is a brief overview of how the code is organized: preloadResources() // 1. Let CoreImage render the CGImage into the metal texture let commandBufferDescriptor = /// ... enable `encoderExecutionStatus` to capture errors let ciCommandBuffer = commandQueue..makeCommandBuffer(descriptor: commandBufferDescriptor)          let ciSourceImage = CIImage(cgImage: sourceImage)         ciContext.render(ciSourceImage,                                           to: sourceImageTexture,                                           commandBuffer: ciCommandBuffer,                                           bounds: sourceImageTexture.bounds2D,                                           colorSpace: CGColorSpaceCreateDeviceRGB())         ciCommandBuffer.commit() // 2. Do the rest of the image processing let commandBuffer = commandQueue.makeCommandBuffer(descriptor: commandBufferDescriptor)!         try imageProcessorA.encode(commandBuffer: commandBuffer,                                      sourceTexture: sourceImageTexture,                                      destinationTexture: sourceImageIntermediateTexture)         try imageProcessorA.encode(commandBuffer: curveDetectionCommandBuffer,                                   sourceTexture: sourceImageIntermediateTexture,                                   destinationTexture: destinationImageTexture)         commandBuffer.commit() imageProcessorA contains kernelA and kernelB and performs the synchronization as described above. I suppose I could schedule a technical review session with an engineer to provide more details of the project if more context is needed to resolve the problem.
Topic: Programming Languages SubTopic: Swift Tags:
Mar ’22
Reply to Metal Quadgroups Example Usage
I found the tech talk "Discover advances in A15 Bionic" which describes one use case of quadgroups and quadgroup functions at around the 21:00 minute mark where they're used to reduce texture reads. If anyone has any other use cases let us know.
Topic: Programming Languages SubTopic: Swift Tags:
Replies
Boosts
Views
Activity
Jan ’22
Reply to Unexpected behavior for shared MTLBuffer during CPU work
extension MTLCommandBuffer {     func encodeCPUExecution(for sharedEvent: MTLSharedEvent, listener: MTLSharedEventListener, work: @escaping () -> Void) {         let value = sharedEvent.signaledValue         sharedEvent.notify(listener, atValue: value + 1) { event, _ in             work()             event.signaledValue = value + 2         }         encodeSignalEvent(sharedEvent, value: value + 1)         encodeWaitForEvent(sharedEvent, value: value + 2)     } } This is the code for encodeCPUExecution my mistake for not making it clear enough. In fact the GPU does wait on value + 2 as you described, yet the behavior still exists. The issue is that the computation is quite suited for CPU execution (it can actually take advantage of dynamic programming for O(n) time) and is not suited for GPU execution, though I suppose you could have a single thread write the result out in a similar way the CPU does (which is probably more performant even) I would still like to figure out why this behavior exists in the first place, even if the computation is pushed to a single thread on the GPU
Topic: Programming Languages SubTopic: Swift Tags:
Replies
Boosts
Views
Activity
Feb ’22
Reply to Unexpected behavior for shared MTLBuffer during CPU work
Currently, since this project is a work-in-progress, only a single execution of the image pipeline executes. During the execution, theMTLCaptureManager captures the execution of the command buffer. There is no loop: it executes exactly once, and its execution is analyzed. Within the execution of the image processing pipeline, this is the only spot where the GPU-CPU synchronization occurs with the shared event. The shared event resource, as well as the other resources in the pipeline, are created before the creation of the command buffer. The resources used in the pipeline are all tracked by Metal (hazardTrackingMode = .tracked) (though I hope to change this in the future and use heaps for more efficiency) Here is a brief overview of how the code is organized: preloadResources() // 1. Let CoreImage render the CGImage into the metal texture let commandBufferDescriptor = /// ... enable `encoderExecutionStatus` to capture errors let ciCommandBuffer = commandQueue..makeCommandBuffer(descriptor: commandBufferDescriptor)          let ciSourceImage = CIImage(cgImage: sourceImage)         ciContext.render(ciSourceImage,                                           to: sourceImageTexture,                                           commandBuffer: ciCommandBuffer,                                           bounds: sourceImageTexture.bounds2D,                                           colorSpace: CGColorSpaceCreateDeviceRGB())         ciCommandBuffer.commit() // 2. Do the rest of the image processing let commandBuffer = commandQueue.makeCommandBuffer(descriptor: commandBufferDescriptor)!         try imageProcessorA.encode(commandBuffer: commandBuffer,                                      sourceTexture: sourceImageTexture,                                      destinationTexture: sourceImageIntermediateTexture)         try imageProcessorA.encode(commandBuffer: curveDetectionCommandBuffer,                                   sourceTexture: sourceImageIntermediateTexture,                                   destinationTexture: destinationImageTexture)         commandBuffer.commit() imageProcessorA contains kernelA and kernelB and performs the synchronization as described above. I suppose I could schedule a technical review session with an engineer to provide more details of the project if more context is needed to resolve the problem.
Topic: Programming Languages SubTopic: Swift Tags:
Replies
Boosts
Views
Activity
Mar ’22