JLNZ’s Profile | Apple Developer Forums

JLNZ

Last seen

Post

Replies

Boosts

Views

Activity

Reply to MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)

Update: Workaround found -- and a clearer diagnosis of the bug for Apple engineers Following up on my original report. I have now confirmed both the root cause and a working workaround. The bug in one sentence: macOS 26.4.1 reports ~40 GiB of "other allocations" in the MPS memory pool at a clean system baseline, consuming memory that should be available to ML workloads. How to reproduce it: Fresh restart, nothing open except Terminal Disable all background processes and login items Run sudo purge && vm_stat Note that vm_stat shows ~15-20 GB free pages -- the physical memory is there Launch any MPS workload that allocates >15 GB and observe the OOM error: RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The "other allocations: 40.17 GiB" figure is the bug. On a 48 GB machine with nothing running, the MPS allocator believes 40 GiB is already committed elsewhere. This figure does not change regardless of what user processes are running or killed. It is intrinsic to macOS 26.4.1 on this hardware. vm_stat confirms the physical memory exists and is free. The MPS allocator is incorrectly accounting for it. Workaround for ComfyUI and similar VAE-based pipelines: if you are running ComfyUI and hitting this OOM, add --cpu-vae to your launch command. This routes the VAE encode/decode to CPU entirely, bypassing the MPS allocator for the largest single allocation in the pipeline. The VAE loads at ~319 MB in float32 on CPU rather than 17.60 GB in bfloat16 on MPS, leaving MPS free for the diffusion steps. The pipeline runs to completion with zero OOM errors and zero INT_MAX errors. The tradeoff is a ~20x slowdown for VAE operations (yes that show how slow CPU is) For me it is approximately 185-200 seconds per draw versus ~10 seconds with MPS VAE. This is a workaround, not a solution that people can use it until Apple fixes the underlying bug. The fix required from Apple: the MPS allocator's "other allocations" accounting needs to be corrected in macOS 26.4.1. The allocator is reserving or reporting as committed approximately 40 GiB of unified memory that is physically free and should be available to MPS workloads. This regression is present in macOS 26.4.1 (build 25E253), the current public release. It is very likely to affect all Apple Silicon users running large ML workloads on this OS version. **This regression was introduced in macOS 26.4. ** My ML VAE pipeline ran correctly on macOS 26.3. The auto upgrade to MacOS 26.4 happened over night and I started to experience failure the next morning. Upon discovering the failure I updated to macOS 26.4.1 hoping it contained a fix but it does not. The bug is present in both 26.4 and 26.4.1. macOS 26.3 is the last known good version for me. Hardware: Mac17,8 (M5 Pro), 48 GB unified memory, macOS 26.4.1 (25E253).

Machine Learning & AI General

Reply to MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)

Machine Learning & AI General

Replies
Boosts
Views
Activity: 2h