Memory ordering

Question

Created Jul ’20

Replies 2

Boosts 0

Participants 2

I have a complex CAS loop with branches that essentially implement a mutex and I'm porting it from CUDA to Metal. I'm looking for the equivalent of CUDA __treadfence();

=> docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#synchronization-functions

Unfortunately metal::threadgroup_barrier(metal::mem_flags::mem_device) also implies execution synchronization and needs to be "seen" by all threads or deadlock.

I need to have one store to buffer A happen before another store to buffer B.

Atomic memory oder options are only one: "relaxed".

How to accomplish this ordering guarantee?

Boost

Answer 1

rianflo OP

Aug ’20

What does this undocumented builtin do?

Code Block __metal_atomic_fence(int, int, int)	-> air.atomic.fence(i32 0, i32 0, i32 0)

0

Answer 2

iaomw OP

Jan ’22

Hi, did you solve that problem?

0