There are multiple CUs processing both Vertex and Fragment work. So that's what you are seeing the capture.
You really shouldn't have one draw per command buffer. You should have one (or a small number of command buffers) that are enqued in the order you want the queue to process them in, and then use a series of render passes on the command buffer to submit draws that pertain to a particular set of render targets. The encoders of the render passes will run in sequence within a command buffer, but command buffers are allowed to execute out of order if there are no dependencies.
CommandBuffers aren't cheap.
Topic:
Graphics & Games
SubTopic:
General
Tags: