Is it possible this is caused by the Ultra putting certain cores to sleep and that the queue has work assigned to those cores? This seems to happen only with smaller problems (core usage is <= 4 of 20), so opens the possibility that parts of the chip are asleep. I suppose another one is cache coherency between the chips, that the call to "wake" the queue to finish the work "in-flight" is missing.
GROUP_FAIL
<OS_dispatch_queue_concurrent: QUEUE_NAME[ADDR] = { xref = 1, ref = 9, sref = 1, target = com.apple.root.default-qos[ADDR], width = 0xffe, state = 0x00000c1000000000, in-flight = 4}>
<OS_dispatch_group: group[ADDR] = { xref = 1, ref = 2, count = 4, gen = 0, waiters = 1, notifs = 0 }>
NORMAL_OPERATION
<OS_dispatch_queue_concurrent: QUEUE_NAME[ADDR] = { xref = 1, ref = 1, sref = 1, target = com.apple.root.default-qos[ADDR], width = 0xffe, state = 0x0000041000000000, in-flight = 0}>
<OS_dispatch_group: group[ADDR] = { xref = 1, ref = 1, count = 0, gen = 0, waiters = 0, notifs = 0 }>