Post

Replies

Boosts

Views

Activity

How to run MPS kernels in one compute encoder?
MPS API allows to run kernels in MTLCommandBuffer but is it possible to create MTLComputeCommandEncoder and run several kernels in it without creating a separate encoder for each kernel under the hood? Something like: // Create Command Buffer // Create Encoder kernel1.encode(encoder: encoder, sourceTexture: source, destinationTexture: k1Destination) kernel2.encode(encoder: encoder, sourceTexture: k1Destination, destinationTexture: destination) encoder.endEncoding() commandBuffer.commit()
0
0
607
Dec ’21
CPU-based transform or GPU-based Affine 3D Transform or Linear 2D Transform + 2D Translation through `fma`, what is more efficient?
I'm working on 2D drawing application. I receive CGPoints from UITouches and transform it to Metal coordinate space. In most cases I have to create several vertices from one CGPoint, apply transformation to them and convert to Metal coordinate space. I use simd and vector-matrix multiplication. So I have 4 options to do it. Create affine 3D matrix with linear transform (scale/rotation in my case) + translation (matrix_float3x3) and perform vector-matrix multiplication on CPU side using simd. Create affine transform and perform multiplication on GPU side in vertex function. Create uniform with separate matrix_float2x2 linear transformation and simd_float2 translation and perform fma operation with 2D vector, linear 2D matrix and translation 2D vector on CPU side using Accelerate. The same as third option but perform fma on GPU side in vertex function. What is more efficient? And what are best practices in GPU programming? As I understand correctly fma and vector-matrix multiplication use one processor instruction. Am I right? I have no more than 10 CGPoints which produce about 40-80 vertices on every draw call.
1
0
1.1k
May ’21