Howdy All,
I've been pondering with this issue a few days now - and eventually gave up with using MTLComputeCommandEncoder dispatchThreads method altogether just to be safe. But the problem I've been facing is calling encoder method dispatchThreads with threadsSize having smaller dimension than threadsPerThreadgroup. On my oldish laptop I'm getting 32 threads width for example, and if I use that default value to create size for threadsPerThreadgroup, which I believe is pretty much what documentation states to use, and at the same time start less than 32 threads wide operation, it brings my laptop to total halt.
I'm totally guessing but maybe someone has managed to implement unsigned value to underflow, those tend to bring long loops to execute at least.
As a workaround it seems possible to implement threadsPerThreadgroup to use minimum of threadsSize and threadsPerThreadgroup dimensions myself. But seeing something as simple as this fail gives me indication not to touch this convenience method until it works with basic inputs.
However not necessararily this is affecting all Apple computers but late 2013 MacBook Pro + integrated Intel Iris Pro GPU totally dislikes the situation described.
--
H