rianflo’s Profile | Apple Developer Forums

Memory ordering

I have a complex CAS loop with branches that essentially implement a mutex and I'm porting it from CUDA to Metal. I'm looking for the equivalent of CUDA __treadfence(); => docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#synchronization-functions Unfortunately metal::threadgroup_barrier(metal::mem_flags::mem_device) also implies execution synchronization and needs to be "seen" by all threads or deadlock. I need to have one store to buffer A happen before another store to buffer B. Atomic memory oder options are only one: "relaxed". How to accomplish this ordering guarantee?

Graphics & Games Metal Metal Performance Shaders Metal

949

Jul ’20

Address space qualifiers are a bit overboard

Metal is great. I love the API's design. One thing though about MSL is that it renders using preexisting C++ functions useless due to explicit address space qualifiers. Was that decision made for the sake of lower compile times? With CUDA, which isn't too dissimilar, it can inherit the address space of a pointer. Let's say I have a very large collection of useful headers as a git submodule from somewhere I do not maintain myself. In MSL I cannot just use them, but rather have to modify each and every function declaration such that the argument list includes the ASQ. Example: // some header only lib void usefulFunction(MyType& a) { ... } ... // calling kernel code device MyType* x; MyType x0 = x[0]; usefulFunction(x[0]); usefulFunction(x0); Non of the above works. In CUDA this isn't an issue.

Graphics & Games Metal Metal Performance Shaders Metal

1.1k

Jul ’20

rianflo

Post

Replies

Boosts

Views

Activity