CoreML GPU NaN bug with fused QKV attention on macOS Tahoe

Problem: CoreML produces NaN on GPU (works fine on CPU) when running transformer attention with fused QKV projection on macOS 26.2.

Root cause: The common::fuse_transpose_matmul optimization pass triggers a Metal kernel bug when sliced tensors feed into matmul(transpose_y=True).

Workaround: pipeline = ct.PassPipeline.DEFAULT pipeline.remove_passes(['common::fuse_transpose_matmul']) mlmodel = ct.convert(model, ..., pass_pipeline=pipeline)

Minimal repro: https://github.com/imperatormk/coreml-birefnet/blob/main/apple_bug_repro.py

Affected: Any ViT/Swin/transformer with fused QKV attention (BiRefNet, etc.)

Has anyone else hit this? Filed FB report too.

Saw the same class of bug with a Whisper-based encoder — attention outputs were garbled on GPU, fine on CPU. Your fuse_transpose_matmul removal is the right fix. I’d also tried forcing .cpuAndGPU as a workaround but that kills ANE scheduling entirely which tanks throughput.

CoreML GPU NaN bug with fused QKV attention on macOS Tahoe
 
 
Q