Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets

Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets

Summary

The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional.

This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference.

System

ComponentValue
ModelMacBook Pro (Mac17,9)
ChipApple M5 Pro (applegpu_g17s)
GPU Cores16
RAM24 GB LPDDR5
macOS26.5 (25F71)
MetalMetal 4
GPU DriverAGXMetalG17X 351.2
Xcode26.5 (17F42)

Reproduction

MLX (Python)

pip install mlx mlx-lm
python -m mlx_lm.generate \
  --model mlx-community/Qwen2.5-3B-Instruct-4bit \
  --max-tokens 10 \
  --prompt "Hello"

Expected: Normal text generation
Actual: Crash with:

libc++abi: terminating due to uncaught exception of type std::runtime_error: 
[METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)

llama.cpp

brew install llama.cpp
llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99

Expected: Fast GPU generation
Actual: Process hangs indefinitely

Test Results

ToolModelPeak MemoryResult
MLXQwen2.5-0.5B-4bit0.36 GB✅ Works
MLXQwen2.5-1.5B-4bit0.98 GB✅ Works
MLXQwen3-1.7B-4bit1.01 GB✅ Works
MLXQwen2.5-3B-4bit~1.5 GB❌ Metal OOM crash
MLXQwen3-4B-4bit~2.1 GB❌ Metal OOM crash
MLXQwen3-8B-4bit~4.5 GB❌ Metal OOM crash
llama.cppQwen2.5-0.5B GGUF~0.5 GB❌ Hangs with GPU
llama.cppQwen2.5-0.5B GGUF~0.5 GB✅ Works with CPU only

Key Evidence

  1. Hardware is healthy — Apple Diagnostics passed all tests
  2. Basic Metal works — matmul, array ops work fine
  3. CPU inference works — llama.cpp with -ngl 0 runs correctly
  4. The error is NOT about actual memory exhaustionkIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set.

Crash Log Extract

Thread 31 Crashed:
0   libsystem_kernel.dylib    __pthread_kill + 8
1   libsystem_pthread.dylib   pthread_kill + 296
2   libsystem_c.dylib         abort + 148
3   Metal                     MTLReportFailure.cold.1 + 48
4   Metal                     MTLReportFailure + 576
5   Metal                     -[_MTLCommandBuffer addCompletedHandler:] + 104
...
Exception Type: EXC_CRASH (SIGABRT)
Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6

Related Issues

Request

Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available).

Happy to provide full crash logs, sysdiagnose archives, or run additional tests.

Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with &lt;2GB working sets
 
 
Q