Reusing one big buffer instead of many seems like a great idea. Less moving parts the better here.
Why the vertices are expensive to compute is mostly a question of scale. For instance, with indirect draw calls I can handle 4000+ chunks of terrain and 25 million+ vertices at 60 fps. With much lower terrain distances the meshing on the CPU can keep up fine, but when I push it there is a lot of frame dropping when moving over the terrain and generating meshes for dozens or hundreds of chunks. Part of this is that my meshing is single-threaded on the CPU, but my experience with Swift multi-threading is so far hit and miss for this kind of thing.
As such, parallelism is the entire point of trying to offload this to a compute shader.
The process of vertex generation is pretty standard for block/Minecraft-style terrain. We loop through all the blocks in a chunk, and if a block is solid, we check if any of its neighbours are air, and therefore if we should put a quad opposite it. Roughly:
for y in i..<CHUNK_SIZE_Y {
for z in i..<CHUNK_SIZE {
for x in i..<CHUNK_SIZE {
let index = blockIndex(x, y, z)
let block = block(at: index)
if block.type != BLOCK_TYPE_AIR {
for faceOffset in blockFaceOffsets {
let neighbourPosition = chunkPosition + faceOffset
if shouldDrawFace(at: neighbourPosition, in: chunk) {
// add interleaved vertices plus indices for the face
}
}
}
}
}
}
There is no doubt room for optimization in what I am doing, but there is also the limitation of not doing this in parallel. I think it's appropriate, since each result does not depend on any other result. I'll experiment with using a single buffer for copying the compute shader results + offsets.
Also curious if there are any other obvious approaches I'm missing to take advantage of the parallelism of GPU shaders!