Product macOS Version macOS 26.4.1 (public release) Hardware Apple M5 Pro, 48 GB unified memory
Summary On macOS 26.4.1, the MPS backend consistently reports approximately 40 GiB of “other allocations” on a 48 GB M5 Pro machine, even on a freshly rebooted system with minimal user applications running. This leaves insufficient memory for large GPU tensor operations that previously succeeded on earlier macOS versions. The failure manifests as: RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The “other allocations: 40.17 GiB” value is consistent across reboots and does not change materially when user applications are quit. This suggests macOS 26.4.1 has increased its baseline GPU/unified memory consumption compared to prior releases in a way that is visible to the MPS allocator.
Steps to Reproduce
- Fresh reboot of M5 Pro, 48 GB, macOS 26.4.1
- Launch a PyTorch 2.11.0 application using MPS as the compute device
- Load a large model into MPS memory (~17 GiB, e.g. a VAE encoder in bfloat16)
- Attempt to allocate an additional ~7.6 GiB workspace tensor for a matrix multiplication operation (torch.bmm)
Result: RuntimeError: MPS backend out of memory, with “other allocations” reported at ~40 GiB despite no large user processes holding GPU memory. Expected: The operation should succeed. 17.60 + 7.63 = 25.23 GiB, which is well within the 48 GiB physical memory of the machine.
Additional Observations • vm_stat on a clean boot shows ~24 GB of free system RAM before the PyTorch application launches, consistent with normal OS usage. The 40 GiB figure reported by the MPS allocator as “other allocations” does not correspond to identifiable user processes. • The max allowed: 63.65 GiB ceiling reported by MPS exceeds the physical 48 GiB of the machine, suggesting MPS is using a memory limit calculation that does not account for actual physical constraints on unified memory architectures. • macOS 26.4 introduced a related regression (deterministic RuntimeError: MPSGraph does not support tensor dims larger than INT_MAX) in the same MPS buffer stride arithmetic path. That specific error was resolved in 26.4.1, but the OOM regression described here persists. • This operation succeeded on the same hardware under earlier macOS releases. The increased “other allocations” baseline appears to be specific to macOS 26.x.
Impact Machine learning workloads that previously ran successfully on 48 GB Apple Silicon machines are failing on macOS 26.4.1 due to this increased baseline GPU memory consumption. Applications using PyTorch MPS, Core ML, and potentially Metal Performance Shaders directly may be affected.
Workaround None identified. Reducing application model size or splitting operations into smaller chunks does not resolve the issue because the constraint is in the “other allocations” baseline, not in the application’s own allocations.
Update: Workaround found -- and a clearer diagnosis of the bug for Apple engineers
Following up on my original report. I have now confirmed both the root cause and a working workaround.
The bug in one sentence: macOS 26.4.1 reports ~40 GiB of "other allocations" in the MPS memory pool at a clean system baseline, consuming memory that should be available to ML workloads.
How to reproduce it:
Fresh restart, nothing open except Terminal Disable all background processes and login items Run sudo purge && vm_stat Note that vm_stat shows ~15-20 GB free pages -- the physical memory is there Launch any MPS workload that allocates >15 GB and observe the OOM error:
RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The "other allocations: 40.17 GiB" figure is the bug. On a 48 GB machine with nothing running, the MPS allocator believes 40 GiB is already committed elsewhere. This figure does not change regardless of what user processes are running or killed. It is intrinsic to macOS 26.4.1 on this hardware. vm_stat confirms the physical memory exists and is free. The MPS allocator is incorrectly accounting for it.
Workaround for ComfyUI and similar VAE-based pipelines: if you are running ComfyUI and hitting this OOM, add --cpu-vae to your launch command. This routes the VAE encode/decode to CPU entirely, bypassing the MPS allocator for the largest single allocation in the pipeline. The VAE loads at ~319 MB in float32 on CPU rather than 17.60 GB in bfloat16 on MPS, leaving MPS free for the diffusion steps. The pipeline runs to completion with zero OOM errors and zero INT_MAX errors. The tradeoff is a ~20x slowdown for VAE operations (yes that show how slow CPU is) For me it is approximately 185-200 seconds per draw versus ~10 seconds with MPS VAE. This is a workaround, not a solution that people can use it until Apple fixes the underlying bug.
The fix required from Apple: the MPS allocator's "other allocations" accounting needs to be corrected in macOS 26.4.1. The allocator is reserving or reporting as committed approximately 40 GiB of unified memory that is physically free and should be available to MPS workloads. This regression is present in macOS 26.4.1 (build 25E253), the current public release. It is very likely to affect all Apple Silicon users running large ML workloads on this OS version.
**This regression was introduced in macOS 26.4. ** My ML VAE pipeline ran correctly on macOS 26.3. The auto upgrade to MacOS 26.4 happened over night and I started to experience failure the next morning. Upon discovering the failure I updated to macOS 26.4.1 hoping it contained a fix but it does not. The bug is present in both 26.4 and 26.4.1. macOS 26.3 is the last known good version for me.
Hardware: Mac17,8 (M5 Pro), 48 GB unified memory, macOS 26.4.1 (25E253).