maxwellpirtle’s Profile | Apple Developer Forums

Threadgroup memory write then read without barrier

I posted this question to StackOverflow. Perhaps it is better suited here where Apple developers are more likely to see it. I was looking through the project linked on the page "Selecting Device Objects for Compute Processing" in the Metal documentation (linked here - https://developer.apple.com/documentation/metal/gpu_selection_in_macos/selecting_device_objects_for_compute_processing) There, I noticed a clever use of threadgroup memory that I am hoping to adopt in my own particle simulator. However, before I do so I need to understand a particular aspect of threadgroup memory and what the developers are doing in this scenario. The code contains a segment like so: metal // In AAPLKernels.metal // Parameter of the kernel threadgroup float4* sharedPosition [[threadgroup(0)]] // Body ... // For each particle / body for(i = 0; i params.numBodies; i += numThreadsInGroup) { // Because sharedPosition uses the threadgroup address space, 'numThreadsInGroup' elements // of sharedPosition will be initialized at once (not just one element at lid as it // may look like) sharedPosition[threadInGroup] = oldPosition[sourcePosition]; j = 0; while(j numThreadsInGroup) { acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr); } // while sourcePosition += numThreadsInGroup; } // for In particular, the comment just before the assignment of sharedPosition starting with "Because..." I found confusing. I haven't read anywhere that threadgroup memory writes happen on all threads in the same threadgroup simultaneously; in fact, I thought a barrier would be needed before reading from the shared memory pool again to avoid undefined behavior since *each* thread is subsequently reading from the entire pool of threadgroup memory after the assignment (the assignment being a write of course). Why is a barrier unnecessary here?

Programming Languages Swift Metal Swift

1.1k

Feb ’21

maxwellpirtle

Post

Replies

Boosts

Views

Activity