Hi Eugene, thanks for your answer. I think I understand the basics, but what evades me is how to copy the texture to the thread group memory efficiently. For example, let's imagine a following scenario: my compute shader will work on 16x16 pixels at a time and it needs access to a certain neoughbourhood, so I want to load a 32x32 block from the texture. I can then imagine the following pseudocode:
threadgroup pixel *data;
if(thread_index_in_threadgroup == 0) {
copy_texture_region(texture, data, <base offset>, 32, 32);
}
threadgroup_barrier();
output.write(compute_value_for_pixel(thread_position_in_grid, data));
but I have no idea how to do the copy_texture_slice() part efficiently. The talk implies that there is an efficient block transfer function for this but I can't find it in the documentation. A naive way would be to load a bunch of pixels in each thread but that makes everything ridiculously complicated...
P.S. Sorry for terrible formatting, the forum collapses my comment in a very awkward way...
Topic:
Graphics & Games
SubTopic:
General
Tags: