CoreML Inference Acceleration

Question

Created Sep ’25

Replies 2

Boosts 0

Participants 2

Hello everyone, I have a visual convolutional model and a video that has been decoded into many frames. When I perform inference on each frame in a loop, the speed is a bit slow. So, I started 4 threads, each running inference simultaneously, but I found that the speed is the same as serial inference, every single forward inference is slower. I used the mactop tool to check the GPU utilization, and it was only around 20%. Is this normal? How can I accelerate it?

Answer 1

Hugoo OP

Sep ’25

hello, can anyone help me? please

Answer 2

moto_not_apple OP

Sep ’25

Instruments is your friend. Check this WWDC video: https://developer.apple.com/videos/play/wwdc2023/10049.

Core ML used to serialize predictions per MLModel instance. In recent years this per-instance lock has been relaxed, but the optimization is often available only for the newer model type (ML Program) and API usage (async predictions.)

Using Instruments, we can see which activities are serialized and make an informed decision to utilize the compute resource.