Post

Replies

Boosts

Views

Created

Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets Summary The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional. This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference. System Component Value Model MacBook Pro (Mac17,9) Chip Apple M5 Pro (applegpu_g17s) GPU Cores 16 RAM 24 GB LPDDR5 macOS 26.5 (25F71) Metal Metal 4 GPU Driver AGXMetalG17X 351.2 Xcode 26.5 (17F42) Reproduction MLX (Python) pip install mlx mlx-lm python -m mlx_lm.generate \ --model mlx-community/Qwen2.5-3B-Instruct-4bit \ --max-tokens 10 \ --prompt "Hello" Expected: Normal text generation Actual: Crash with: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) llama.cpp brew install llama.cpp llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99 Expected: Fast GPU generation Actual: Process hangs indefinitely Test Results Tool Model Peak Memory Result MLX Qwen2.5-0.5B-4bit 0.36 GB ✅ Works MLX Qwen2.5-1.5B-4bit 0.98 GB ✅ Works MLX Qwen3-1.7B-4bit 1.01 GB ✅ Works MLX Qwen2.5-3B-4bit ~1.5 GB ❌ Metal OOM crash MLX Qwen3-4B-4bit ~2.1 GB ❌ Metal OOM crash MLX Qwen3-8B-4bit ~4.5 GB ❌ Metal OOM crash llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ❌ Hangs with GPU llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ✅ Works with CPU only Key Evidence Hardware is healthy — Apple Diagnostics passed all tests Basic Metal works — matmul, array ops work fine CPU inference works — llama.cpp with -ngl 0 runs correctly The error is NOT about actual memory exhaustion — kIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set. Crash Log Extract Thread 31 Crashed: 0 libsystem_kernel.dylib __pthread_kill + 8 1 libsystem_pthread.dylib pthread_kill + 296 2 libsystem_c.dylib abort + 148 3 Metal MTLReportFailure.cold.1 + 48 4 Metal MTLReportFailure + 576 5 Metal -[_MTLCommandBuffer addCompletedHandler:] + 104 ... Exception Type: EXC_CRASH (SIGABRT) Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6 Related Issues ml-explore/mlx#3586 — Metal compiler regression on macOS 26.5 ml-explore/mlx#3534 — M5 float32 precision issue ml-explore/mlx#3568 — M5 random divergence ml-explore/mlx#3539 — Metal residency OOM (M4 Max) Request Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available). Happy to provide full crash logs, sysdiagnose archives, or run additional tests.
0
0
22
13h
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets
Metal GPU Driver Crash on M5 Pro + macOS 26.5 — kIOGPUCommandBufferCallbackErrorOutOfMemory with <2GB working sets Summary The Metal driver AGXMetalG17X 351.2 on macOS 26.5 (25F71) for the M5 Pro chip crashes with kIOGPUCommandBufferCallbackErrorOutOfMemory (00000008) when running LLM inference workloads with working sets as small as ~1.5GB, despite 24GB of unified memory being available and Apple Diagnostics confirming the hardware is fully functional. This affects multiple tools: MLX, llama.cpp (Metal backend), and native apps using Metal for inference. System Component Value Model MacBook Pro (Mac17,9) Chip Apple M5 Pro (applegpu_g17s) GPU Cores 16 RAM 24 GB LPDDR5 macOS 26.5 (25F71) Metal Metal 4 GPU Driver AGXMetalG17X 351.2 Xcode 26.5 (17F42) Reproduction MLX (Python) pip install mlx mlx-lm python -m mlx_lm.generate \ --model mlx-community/Qwen2.5-3B-Instruct-4bit \ --max-tokens 10 \ --prompt "Hello" Expected: Normal text generation Actual: Crash with: libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) llama.cpp brew install llama.cpp llama-cli --model model.gguf --prompt "Hello" --n-predict 20 --n-gpu-layers 99 Expected: Fast GPU generation Actual: Process hangs indefinitely Test Results Tool Model Peak Memory Result MLX Qwen2.5-0.5B-4bit 0.36 GB ✅ Works MLX Qwen2.5-1.5B-4bit 0.98 GB ✅ Works MLX Qwen3-1.7B-4bit 1.01 GB ✅ Works MLX Qwen2.5-3B-4bit ~1.5 GB ❌ Metal OOM crash MLX Qwen3-4B-4bit ~2.1 GB ❌ Metal OOM crash MLX Qwen3-8B-4bit ~4.5 GB ❌ Metal OOM crash llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ❌ Hangs with GPU llama.cpp Qwen2.5-0.5B GGUF ~0.5 GB ✅ Works with CPU only Key Evidence Hardware is healthy — Apple Diagnostics passed all tests Basic Metal works — matmul, array ops work fine CPU inference works — llama.cpp with -ngl 0 runs correctly The error is NOT about actual memory exhaustion — kIOGPUCommandBufferCallbackErrorOutOfMemory means the kernel rejects the Metal memory commit, not that physical memory is full. The system reports 17.76GB available for Metal working set. Crash Log Extract Thread 31 Crashed: 0 libsystem_kernel.dylib __pthread_kill + 8 1 libsystem_pthread.dylib pthread_kill + 296 2 libsystem_c.dylib abort + 148 3 Metal MTLReportFailure.cold.1 + 48 4 Metal MTLReportFailure + 576 5 Metal -[_MTLCommandBuffer addCompletedHandler:] + 104 ... Exception Type: EXC_CRASH (SIGABRT) Termination Reason: Namespace SIGNAL, Code 6, Abort trap: 6 Related Issues ml-explore/mlx#3586 — Metal compiler regression on macOS 26.5 ml-explore/mlx#3534 — M5 float32 precision issue ml-explore/mlx#3568 — M5 random divergence ml-explore/mlx#3539 — Metal residency OOM (M4 Max) Request Please investigate the AGXMetalG17X driver for M5 Pro on macOS 26.5. The driver appears to incorrectly reject Metal memory commits for LLM inference workloads, even when the working set is well within the system's reported limits (1.5GB requested vs 17.76GB available). Happy to provide full crash logs, sysdiagnose archives, or run additional tests.
Replies
0
Boosts
0
Views
22
Activity
13h