Hi jcookie!
When using threadgroup memory in your compute kernel, you basically use the same "local" memory as ImageBlock uses. That was exactly what Harsh has mentioned - you can explicitly use TileMemory by declaring threadgroup memory allocation. In other APIs, these type of memory is called "shared" or "local", for your reference.
Below is basic compute example (with no threadgroup usage, but you can get the idea): https://developer.apple.com/documentation/metal/processing_a_texture_in_a_compute_function?language=objc
I can't immediately find an example on official Apple Developer website. The idea is you first bring texture/buffer data to threadgroup memory in your shader, then you do all the calculations ONLY on this local threadgroup memory. Since this memory is much faster (though banks should be kept in mind), you can pack more ALUs and do more work while waiting less for the memory. In the end of the computation, you write (flush) threadgroup memory to device memory. That was what Harsh called "flush" and you should do it in the compute kernel yourself.
Topic:
Graphics & Games
SubTopic:
General
Tags: