Post

Replies

Boosts

Views

Activity

MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)
Product macOS Version macOS 26.4.1 (public release) Hardware Apple M5 Pro, 48 GB unified memory Summary On macOS 26.4.1, the MPS backend consistently reports approximately 40 GiB of “other allocations” on a 48 GB M5 Pro machine, even on a freshly rebooted system with minimal user applications running. This leaves insufficient memory for large GPU tensor operations that previously succeeded on earlier macOS versions. The failure manifests as: RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The “other allocations: 40.17 GiB” value is consistent across reboots and does not change materially when user applications are quit. This suggests macOS 26.4.1 has increased its baseline GPU/unified memory consumption compared to prior releases in a way that is visible to the MPS allocator. Steps to Reproduce Fresh reboot of M5 Pro, 48 GB, macOS 26.4.1 Launch a PyTorch 2.11.0 application using MPS as the compute device Load a large model into MPS memory (~17 GiB, e.g. a VAE encoder in bfloat16) Attempt to allocate an additional ~7.6 GiB workspace tensor for a matrix multiplication operation (torch.bmm) Result: RuntimeError: MPS backend out of memory, with “other allocations” reported at ~40 GiB despite no large user processes holding GPU memory. Expected: The operation should succeed. 17.60 + 7.63 = 25.23 GiB, which is well within the 48 GiB physical memory of the machine. Additional Observations • vm_stat on a clean boot shows ~24 GB of free system RAM before the PyTorch application launches, consistent with normal OS usage. The 40 GiB figure reported by the MPS allocator as “other allocations” does not correspond to identifiable user processes. • The max allowed: 63.65 GiB ceiling reported by MPS exceeds the physical 48 GiB of the machine, suggesting MPS is using a memory limit calculation that does not account for actual physical constraints on unified memory architectures. • macOS 26.4 introduced a related regression (deterministic RuntimeError: MPSGraph does not support tensor dims larger than INT_MAX) in the same MPS buffer stride arithmetic path. That specific error was resolved in 26.4.1, but the OOM regression described here persists. • This operation succeeded on the same hardware under earlier macOS releases. The increased “other allocations” baseline appears to be specific to macOS 26.x. Impact Machine learning workloads that previously ran successfully on 48 GB Apple Silicon machines are failing on macOS 26.4.1 due to this increased baseline GPU memory consumption. Applications using PyTorch MPS, Core ML, and potentially Metal Performance Shaders directly may be affected. Workaround None identified. Reducing application model size or splitting operations into smaller chunks does not resolve the issue because the constraint is in the “other allocations” baseline, not in the application’s own allocations.
1
0
180
26m
MPS backend reports ~40 GiB 'other allocations' on 48 GB M5 Pro under macOS 26.4.1, blocking large tensor operations (PyTorch)
Product macOS Version macOS 26.4.1 (public release) Hardware Apple M5 Pro, 48 GB unified memory Summary On macOS 26.4.1, the MPS backend consistently reports approximately 40 GiB of “other allocations” on a 48 GB M5 Pro machine, even on a freshly rebooted system with minimal user applications running. This leaves insufficient memory for large GPU tensor operations that previously succeeded on earlier macOS versions. The failure manifests as: RuntimeError: MPS backend out of memory (MPS allocated: 17.60 GiB, other allocations: 40.17 GiB, max allowed: 63.65 GiB). Tried to allocate 7.63 GiB on private pool. The “other allocations: 40.17 GiB” value is consistent across reboots and does not change materially when user applications are quit. This suggests macOS 26.4.1 has increased its baseline GPU/unified memory consumption compared to prior releases in a way that is visible to the MPS allocator. Steps to Reproduce Fresh reboot of M5 Pro, 48 GB, macOS 26.4.1 Launch a PyTorch 2.11.0 application using MPS as the compute device Load a large model into MPS memory (~17 GiB, e.g. a VAE encoder in bfloat16) Attempt to allocate an additional ~7.6 GiB workspace tensor for a matrix multiplication operation (torch.bmm) Result: RuntimeError: MPS backend out of memory, with “other allocations” reported at ~40 GiB despite no large user processes holding GPU memory. Expected: The operation should succeed. 17.60 + 7.63 = 25.23 GiB, which is well within the 48 GiB physical memory of the machine. Additional Observations • vm_stat on a clean boot shows ~24 GB of free system RAM before the PyTorch application launches, consistent with normal OS usage. The 40 GiB figure reported by the MPS allocator as “other allocations” does not correspond to identifiable user processes. • The max allowed: 63.65 GiB ceiling reported by MPS exceeds the physical 48 GiB of the machine, suggesting MPS is using a memory limit calculation that does not account for actual physical constraints on unified memory architectures. • macOS 26.4 introduced a related regression (deterministic RuntimeError: MPSGraph does not support tensor dims larger than INT_MAX) in the same MPS buffer stride arithmetic path. That specific error was resolved in 26.4.1, but the OOM regression described here persists. • This operation succeeded on the same hardware under earlier macOS releases. The increased “other allocations” baseline appears to be specific to macOS 26.x. Impact Machine learning workloads that previously ran successfully on 48 GB Apple Silicon machines are failing on macOS 26.4.1 due to this increased baseline GPU memory consumption. Applications using PyTorch MPS, Core ML, and potentially Metal Performance Shaders directly may be affected. Workaround None identified. Reducing application model size or splitting operations into smaller chunks does not resolve the issue because the constraint is in the “other allocations” baseline, not in the application’s own allocations.
Replies
1
Boosts
0
Views
180
Activity
26m