Hello,
I’m experiencing a severe performance degradation when running CoreML models on a live AVFoundation video feed compared to offline or synthetic inference. This happens across multiple models I've converted (including SCI, RTMPose, and RTMW) and affects multiple devices. The Environment
OS: macOS 26.3, iOS 26.3, iPadOS 26.3
Hardware: Mac14,6 (M2 Max), iPad Pro 11 M1, iPhone 13 mini
Compute Units: cpuAndNeuralEngine
The Numbers
When testing my SCI_output_image_int8.mlpackage model, the inference timings are drastically different:
Synthetic/Offline Inference: ~1.34 ms
Live Camera Inference: ~15.96 ms
Preprocessing is completely ruled out as the bottleneck. My profiling shows total preprocessing (nearest-neighbor resize + feature provider creation) takes only ~0.4 ms in camera mode. Furthermore, no frames are being dropped. What I've Tried
I am building a latency-critical app and have implemented almost every recommended optimization to try and fix this, but the camera-feed penalty remains:
-
Matched the AVFoundation camera output format exactly to the model input (640x480 at 30/60fps).
-
Used IOSurface-backed pixel buffers for everything (camera output, synthetic buffer, and resize buffer).
-
Enabled outputBackings.
-
Loaded the model once and reused it for all predictions.
-
Configured MLModelConfiguration with reshapeFrequency = .frequent and specializationStrategy = .fastPrediction.
-
Wrapped inference in
ProcessInfo.processInfo.beginActivity(options: .latencyCritical, reason: "CoreML_Inference"). -
Set DispatchQueue to qos: .userInteractive.
-
Disabled the idle timer and enabled iOS Game Mode.
-
Exported models using coremltools 9.0 (deployment target iOS 26) with ImageType inputs/outputs and INT8 quantization.
Reproduction
To completely rule out UI or rendering overhead, I wrote a standalone Swift CLI script that isolates the AVFoundation and CoreML pipeline. The script clearly demonstrates the ~15ms latency on live camera frames versus the ~1ms latency on synthetic buffers.
(I have attached camera_coreml_benchmark.swift and coreml model (very light low light enghancement model) to this repo on github https://github.com/pzoltowski/apple-coreml-camera-latency-repro).
My Question: Is this massive overhead expected behavior for AVFoundation + Core ML on live feeds, or is this a framework/runtime bug? If expected, what is the Apple-recommended pattern to bypass this camera-only inference slowdown?
One think found interesting when running in debug model was faster (not as fast as in performance benchmark but faster than 16ms. Also somehow if I did some dummy calculation on on different DispatchQueue also seems like model got slightly faster. So maybe its related to ANE Power State issues (Jitter/SoC Wake) and going to fast to sleep and taking a long time to wakeup? Doing dummy calculation in background thought is probably not a solution.
Thanks in advance for any insights!