Massive CoreML latency spike on live AVFoundation camera feed vs. offline inference (CPU+ANE)

Hello,

I’m experiencing a severe performance degradation when running CoreML models on a live AVFoundation video feed compared to offline or synthetic inference. This happens across multiple models I've converted (including SCI, RTMPose, and RTMW) and affects multiple devices. The Environment

OS: macOS 26.3, iOS 26.3, iPadOS 26.3

Hardware: Mac14,6 (M2 Max), iPad Pro 11 M1, iPhone 13 mini

Compute Units: cpuAndNeuralEngine

The Numbers

When testing my SCI_output_image_int8.mlpackage model, the inference timings are drastically different:

Synthetic/Offline Inference: ~1.34 ms

Live Camera Inference: ~15.96 ms

Preprocessing is completely ruled out as the bottleneck. My profiling shows total preprocessing (nearest-neighbor resize + feature provider creation) takes only ~0.4 ms in camera mode. Furthermore, no frames are being dropped. What I've Tried

I am building a latency-critical app and have implemented almost every recommended optimization to try and fix this, but the camera-feed penalty remains:

  • Matched the AVFoundation camera output format exactly to the model input (640x480 at 30/60fps).

  • Used IOSurface-backed pixel buffers for everything (camera output, synthetic buffer, and resize buffer).

  • Enabled outputBackings.

  • Loaded the model once and reused it for all predictions.

  • Configured MLModelConfiguration with reshapeFrequency = .frequent and specializationStrategy = .fastPrediction.

  • Wrapped inference in ProcessInfo.processInfo.beginActivity(options: .latencyCritical, reason: "CoreML_Inference").

  • Set DispatchQueue to qos: .userInteractive.

  • Disabled the idle timer and enabled iOS Game Mode.

  • Exported models using coremltools 9.0 (deployment target iOS 26) with ImageType inputs/outputs and INT8 quantization.

Reproduction

To completely rule out UI or rendering overhead, I wrote a standalone Swift CLI script that isolates the AVFoundation and CoreML pipeline. The script clearly demonstrates the ~15ms latency on live camera frames versus the ~1ms latency on synthetic buffers.

(I have attached camera_coreml_benchmark.swift and coreml model (very light low light enghancement model) to this repo on github https://github.com/pzoltowski/apple-coreml-camera-latency-repro).

My Question: Is this massive overhead expected behavior for AVFoundation + Core ML on live feeds, or is this a framework/runtime bug? If expected, what is the Apple-recommended pattern to bypass this camera-only inference slowdown?

One think found interesting when running in debug model was faster (not as fast as in performance benchmark but faster than 16ms. Also somehow if I did some dummy calculation on on different DispatchQueue also seems like model got slightly faster. So maybe its related to ANE Power State issues (Jitter/SoC Wake) and going to fast to sleep and taking a long time to wakeup? Doing dummy calculation in background thought is probably not a solution.

Thanks in advance for any insights!

Massive CoreML latency spike on live AVFoundation camera feed vs. offline inference (CPU+ANE)
 
 
Q