Right now, you're running in O(n^2), and as your screen size gets larger, 1080p vs 2K vs 4k vs 8k, the lengthier your runtime becomes. Moving this to Metal or OpenGL will be the better option since GPUs are faster than CPUs. Explore changing from for loops to something recursive. The BGRAPallet and the pixel buffer ptr should also be the same size, or the first N BGRAPallets are processed, then you're left just updating the position 0 of pixalBufferPtr for another million plus cycles.