Post

Replies

Boosts

Views

Activity

Reply to [Metal] What's wrong with simdgroup_load or simdgroup_store?
Hi PYNing, You should understand how SIMD instructions works. SIMD instruction must be the same for all threads in SIMD group, unless it specified in API. And the same SIMD instruction must be executed by all threads in SIMD group. mmmetal already clarified that in your case only 1 thread will be active and will execute the code. But for correct behaviour all threads in SIMD group should execute this instructions. For Apple GPUs SIMD size is fixed and equal to 32. So in case of M1 all 32 threads must execute the same code path to produce correct result. Be careful also from CPU side, you probably should run tasks in SIMD group size quantities.
Topic: Graphics & Games SubTopic: General Tags:
Apr ’22
Reply to [Metal] What's wrong with simdgroup_load or simdgroup_store?
Hi PYNing, You should understand how SIMD instructions works. SIMD instruction must be the same for all threads in SIMD group, unless it specified in API. And the same SIMD instruction must be executed by all threads in SIMD group. mmmetal already clarified that in your case only 1 thread will be active and will execute the code. But for correct behaviour all threads in SIMD group should execute this instructions. For Apple GPUs SIMD size is fixed and equal to 32. So in case of M1 all 32 threads must execute the same code path to produce correct result. Be careful also from CPU side, you probably should run tasks in SIMD group size quantities.
Topic: Graphics & Games SubTopic: General Tags:
Replies
Boosts
Views
Activity
Apr ’22
Reply to Any Demo for Metal Struct simdgroup_float8x8 and simdgroup_half8x8 ?
Hi, have you tried MPSMatrixMultiplication? It should use this features when possible and it supports fp16/fp32 precision.
Topic: Graphics & Games SubTopic: Metal Tags:
Replies
Boosts
Views
Activity
Jan ’22