Metal

MTL4FXTemporalDenoisedScaler initialization

I’m trying to use MTL4FXTemporalDenoisedScaler, and I’m seeing a crash during initialization even with a very simple sample app. I created a minimal sample here: https://github.com/tatsuya-ogawa/MetalFXInitExample The exception is: NSException: "-[AGXG16XFamilyHeap baseObject]: unrecognized selector sent to instance ..." What I found is: • This works: descriptor.makeTemporalDenoisedScaler(device: device) • This crashes: descriptor.makeTemporalDenoisedScaler(device: device, compiler: metal4Compiler) So the issue seems to happen only with the Metal4FX version. For testing, I’m using an iPhone 15 Pro. According to the Metal Feature Set Tables, MetalFX denoised upscaling should be supported on Apple9 and later, so I believe the device itself should meet the requirements. Reference: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf Has anyone seen this before, or knows what might be causing it? I’d appreciate any advice. Thanks.

Graphics & Games Metal MetalFX

0

29

5h

MTL4FXTemporalDenoisedScaler initialization

I’m trying to use MTL4FXTemporalDenoisedScaler, and I’m seeing a crash during initialization even with a very simple sample app. I created a minimal sample here: https://github.com/tatsuya-ogawa/MetalFXInitExample The exception is: NSException: "-[AGXG16XFamilyHeap baseObject]: unrecognized selector sent to instance ..." What I found is: • This works: descriptor.makeTemporalDenoisedScaler(device: device) • This crashes: descriptor.makeTemporalDenoisedScaler(device: device, compiler: metal4Compiler) So the issue seems to happen only with the Metal4FX version. For testing, I’m using an iPhone 15 Pro. According to the Metal Feature Set Tables, MetalFX denoised upscaling should be supported on Apple9 and later, so I believe the device itself should meet the requirements. Reference: https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf Has anyone seen this before, or knows what might be causing it? I’d appreciate any advice. Thanks.

Graphics & Games Metal MetalFX

0

24

5h

Question on setVertexBytes

I think if your buffer is less than 4k its recommended to use setVertexBytes, the question I have is can I keep hammering on setVertexBytes as the primary method to issue multiple draw calls within a render buffer and rely on Metal to figure out how to orphan and replace the target buffer? A lot of the primitives I am drawing are less than 4k and the process of wiring down larger segments of memory for individual buffers for each draw primitive call seems to be a negative. And it's just simpler to copy, submit and forget about buffer synchronization.

Graphics & Games Metal

1

0

448

1w

Can a compute pipeline be as efficient as a render pipeline for rasterization?

I'm new to graphics and game design and I just wanted to know if a compute pipeline could be as efficient as a render pipeline for rasterization and an explanation on how and why. Also is it possible to manually perform rasterization with a render pipeline as in manipulate individual pixel data in a metal texture yourself but do it with a render pipeline?

0

161

1w

Metal Shader inside Swift Package not found?

Hello everyone! I am trying to wrap a ViewModifier inside a Swift Package that bundles a metal shader file to be used in the modifier. Everything works as expected in the Preview, in the Simulator and on a real device for iOS. It also works in Preview and in the Simulator for tvOS but not on a real AppleTV. I have tried this on a 4th generation Apple TV running tvOS 26.3 using Xcode 26.2.0. Xcode logs the following: The metallib is processed and exists in the bundle. Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Compiler failed to build request precondition failure: pipeline error: custom_effect-fg2a5cia7fmha4: error: unresolved visible function reference: custom_fn Reason: visible function not loaded Contents of Package.swift: import PackageDescription let package = Package( name: "Test", platforms: [ .iOS(.v17), .tvOS(.v17) ], products: [ .library( name: "Test", targets: [ "Test" ] ) ], targets: [ .target( name: "Test", resources: [ .process("Shaders") ] ), .testTarget( name: "TestTests", dependencies: [ "Test" ] ) ] ) Content of my metal file: #include <metal_stdlib> using namespace metal; [[ stitchable ]] float2 complexWave(float2 position, float time, float2 size, float speed, float strength, float frequency) { float2 normalizedPosition = position / size; float moveAmount = time * speed; position.x += sin((normalizedPosition.x + moveAmount) * frequency) * strength; position.y += cos((normalizedPosition.y + moveAmount) * frequency) * strength; return position; } And my ViewModifier: import MetalKit import SwiftUI extension ShaderFunction { static let complexWave: ShaderFunction = { ShaderFunction( library: .bundle(.module), name: "complexWave" ) }() } extension Shader { static func complexWave(arguments: [Shader.Argument]) -> Shader { Shader(function: .complexWave, arguments: arguments) } } struct WaveModifier: ViewModifier { let start: Date = .now func body(content: Content) -> some View { TimelineView(.animation) { context in let delta = context.date.timeIntervalSince(start) content .visualEffect { view, proxy in view.distortionEffect( .complexWave( arguments: [ .float(delta), .float2(proxy.size), .float(0.5), .float(8), .float(10) ] ), maxSampleOffset: .zero ) } } .onAppear { let paths = Bundle.module.paths(forResourcesOfType: "metallib", inDirectory: nil) print(paths) } } } extension View { public func wave() -> some View { modifier(WaveModifier()) } } #Preview { Image(systemName: "cart") .wave() } Any help is appreciated.

Graphics & Games Metal MetalKit SwiftUI Swift Packages

0

328

2w

BGContinuedProcessingTask GPU access — no iPhone support?

We are developing a video processing app that applies CIFilter chains to video frames. To not force the user to keep the app foregrounded, we were happy to see the introduction of BGContinuedProcessingTask to continue processing when backgrounded. With iOS 26, I was excited to see the com.apple.developer.background-tasks.continued-processing.gpu entitlement, which should allow GPU access in the background. Even the article in the documentation provides "exporting video in a film-editing app" or "applying visual filters (HDR, etc) or compressing images for social media posts" as use cases. However, when I check BGTaskScheduler.shared.supportedResources.contains(.gpu) at runtime, it returns false on every iPhone I've tested (including iPhone 15 Pro and iPhone 16 Pro). From forum responses I've seen, it sounds like background GPU access is currently limited to iPad only. If that's the case, I have a few questions: Is this an intentional, permanent limitation — or is iPhone support planned for a future iOS release? What is the recommended approach for GPU-dependent background work on iPhone? My custom CIKernels are written in Metal (as Apple recommends since CIKL is deprecated), but Metal CIKernels cannot fall back to CPU rendering. This creates a situation where Apple's own deprecation guidance (migrate to Metal) conflicts with background processing realities (no GPU on iPhone). Should developers maintain deprecated CIKL kernel versions alongside Metal kernels purely as a CPU fallback for background execution? That feels like it defeats the purpose of the migration. It seems like a gap in the platform: the API exists, the entitlement exists, but the hardware support isn't there for the most common device category. Any clarity on Apple's direction here would be very helpful.

Graphics & Games Metal iOS Metal Background Tasks Core Image

2

0

181

3w

Using Metal compute for scientific simulation (lattice QCD gauge theory)

I've been using Metal compute shaders for lattice quantum chromodynamics simulations and wanted to share the experience in case others are doing scientific computing on Metal. The workload involves SU(2) matrix operations on 4D lattice grids — lots of 2x2 and 3x3 complex matrix multiplies, reductions over lattice sites, and nearest-neighbor stencil operations. The implementation bridges a C++ scientific framework (Grid) to Metal via Objective-C++ .mm files, with MSL kernels compiled into .metallib archives during the build. Things that work well: Shared memory on M-series eliminates the CPU↔GPU copy overhead that dominates in CUDA workflows The .metallib compilation integrates cleanly with autotools builds using xcrun Float4 packing for SU(2) matrices maps naturally to MSL vector types Things I'm still figuring out: Optimal threadgroup sizes for stencil operations on 4D grids Whether to use MTLHeap for gauge field storage or stick with individual buffers Best practices for double precision — some measurements need float64 but Metal's double support varies by hardware The application is measuring chromofield flux distributions between static quarks, ultimately targeting multi-quark systems. Production runs are on MacBook Pro M-series and Mac Studio. Code: https://github.com/ThinkOffApp/multiquark-lattice-qcd

Graphics & Games Metal Metal Metal Shader Converter

0

105

3w

Metal 4 (validation / debug layer): residency set requirement mismatch for memoryless attachments

Setup: MSAA rendering using a memoryless texture as the color attachment (render_image) and a "normal" texture as the resolve attachment (resolve_image). MTL_DEBUG_LAYER / API validation is enabled for this. When trying to add the memoryless texture to a residency set, I get the following error: -[MTLDebugResidencySet validateResource:], line 114: error 'residency sets do not support memoryless resources. Which is as expected and identical to Metal 3. However, if I don't add it to the residency set, I then get the following error when committing to the command queue: -[MTL4DebugCommandQueue commit:count:options:], line 67: error 'Commit With Options Validation Attachment texture (Label: render_image) used in command buffer (at index 0) is not added to any residency set on the command buffer or command queue. So which way around is actually correct in Metal 4? Either way, this makes the use of memoryless textures/attachments impossible right now when validation is enabled. FWIW: when disabling all validation, either way seems to work just fine. Tested on: M1 Max, macOS 26.3, Xcode 26.2 & 26.4b2

Graphics & Games Metal

0

60

3w

Xcode Metal Capture crash when using MTLSamplerState

The sample code just draw a triangle and sample texture. both sample code can draw a correct triangle and sample texture as expected. there are no error message from terminal. Sample code using constexpr Sampler can capture and replay well. Sample code using a argumentTable to bind a MTLSamplerState was crashed when using Metal capture and replay on Xcode. Here are sample codes. Sample Code Test Environment: M1 Pro MacOS 26.3 (25D125) Xcode Version 26.2 (17C52) Feedback ID: FB22031701

Graphics & Games Metal Metal MetalKit Xcode Graphical Debugger

0

100

3w

The description of set_indices in the MSL reference seems incorrect.

I'm currently learning Metal. While reading the reference, I came across a strange description. Page 78 in Version 4 Reference (2025-10-25) says: It is legal to call the following set_indices functions to set the indices if the position in the index buffer is valid and if the position in the index buffer is a multiple of 2 (uchar2 overload) or 2 (uchar4 overload). The index I needs to be in the range [0, max_indices). void set_indices(uint I, uchar2 v); void set_indices(uint I, uchar4 v); However, it seems that the uchar4 overload should be multiple of 4. Furthermore, there is no explanation of what these methods actually do. I believe it involves setting two to four consecutive indices at once, but there is no mention of that here. I would like to know if the above understanding is correct.

Graphics & Games Metal Metal

0

104

4w

Metal runtime shader library compilation and linking issue

In my project I need to do the following: In runtime create metal Dynamic library from source. In runtime create metal Executable library from source and Link it with my previous created Dynamic library. Create compute pipeline using those two libraries created above. But I get the following error at the third step: Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel " UserInfo={NSLocalizedDescription=Undefined symbols: _Z5noisev, referenced from: OnTheFlyKernel } import Foundation import Metal class MetalShaderCompiler { let device = MTLCreateSystemDefaultDevice()! var pipeline: MTLComputePipelineState! func compileDylib() -> MTLDynamicLibrary { let source = """ #include <metal_stdlib> using namespace metal; half3 noise() { return half3(1, 0, 1); } """ let option = MTLCompileOptions() option.libraryType = .dynamic option.installName = "@executable_path/libFoundation.metallib" let library = try! device.makeLibrary(source: source, options: option) let dylib = try! device.makeDynamicLibrary(library: library) return dylib } func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary { let source = """ #include <metal_stdlib> using namespace metal; extern half3 noise(); kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]], texture2d<half, access::write> dst [[texture(1)]], ushort2 gid [[thread_position_in_grid]]) { half4 rgba = src.read(gid); rgba.rgb += noise(); dst.write(rgba, gid); } """ let option = MTLCompileOptions() option.libraryType = .executable option.libraries = [dylib] let library = try! self.device.makeLibrary(source: source, options: option) return library } func runtime() { let dylib = self.compileDylib() let exlib = self.compileExlib(dylib: dylib) let pipelineDescriptor = MTLComputePipelineDescriptor() pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel") pipelineDescriptor.preloadedLibraries = [dylib] pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil) } }

Graphics & Games Metal Metal

5

0

1.1k

Feb ’26

Terminal Codes

Hello Apple Developers and users I am writing this message reguarding some help on some performance codes/settings I can use for my Macbook since I recently downloaded the MacOs Tahoe 26.2 and its been very glitchy and laggy with gaming and just using my mac normally I have tried using a FPS unlocker and downloading Metal 4 the FPS unlocker hasent worked at all I am still stuck on the normal 60 FPS and need some advice/help. Thank you. Kind regards Zachary

Graphics & Games Metal Developer Tools

0

165

Feb ’26

Unable to find intelgpu_kbl_gt2r0 slice or a compatible one in binary archive

Unable to find intelgpu_kbl_gt2r0 slice or a compatible one in binary archive 'file:///System/Library/PrivateFrameworks/IconRendering.framework/Resources/binary.metallib' available slices: applegpu_g13g, applegpu_g13s, applegpu_g13d, applegpu_g14g, applegpu_g14s, applegpu_g14d, applegpu_g15g, applegpu_g15s, applegpu_g15d, applegpu_g16g, applegpu_g16s, applegpu_g17g, applegpu_g15g, applegpu_g15s, applegpu_g15d, applegpu_g16s Is it related to performance of applications in macOS 26.2 on Intel Macs?

Graphics & Games Metal Metal

3

0

293

Feb ’26

Open Shading Language (OSL) in Metal

Hi. I'm a 3D designer, using Blender for most of my work. The most recent Blender conference discussed utilizing the Open Shading Language (OSL) in their latest versions, which allows designers to write custom shaders for their workflows. At the moment, only Nvidia Optix GPU's can utilize this language for rendering (from what I understand), but Blender developers stated they are waiting on other GPU manufacturers to implement this feature as well. I'm not sure if there are any licensing issues here, but would this be something Apple could implement in Metal to make their hardware more attractive to the 3D design community? Any help or knowledge on this topic would be greatly appreciated.

Graphics & Games Metal Metal Metal Performance Shaders

0

259

Feb ’26

Optimizing HZB Mip-Chain Generation and Bindless Argument Tables in a Custom Metal Engine

Hi everyone, I’ve been developing a custom, end-to-end 3D rendering engine called Crescent from scratch using C++20 and Metal-cpp (targeting macOS and visionOS). My primary goal is to build a zero-bottleneck, GPU-driven pipeline that maximizes the potential of Apple Silicon’s Unified Memory and TBDR architecture. While the fundamental systems are stable, I am looking for architectural feedback from Metal framework engineers regarding specific synchronization and latency challenges. Current Core Implementations: GPU-Driven Instance Culling: High-performance occlusion culling using a Hierarchical Z-Buffer (HZB) approach via Compute Shaders. Clustered Forward Shading: Support for high-count dynamic lights through view-space clustering. Temporal Stability: Custom TAA with history rejection and Motion Blur resolve. Asset Infrastructure: Robust GUID-based scene serialization and a JSON-driven ECS hierarchy. The Architectural Challenge: I am currently seeing slight synchronization overhead when generating the HZB mip-chain. On Apple Silicon, I am evaluating the cost of encoder transitions versus cache-friendly barriers. && m_hzbInitPipeline && m_hzbDownsamplePipeline && !m_hzbMipViews.empty(); if (canBuildHzb) { MTL::ComputeCommandEncoder* hzbInit = commandBuffer->computeCommandEncoder(); hzbInit->setComputePipelineState(m_hzbInitPipeline); hzbInit->setTexture(m_depthTexture, 0); hzbInit->setTexture(m_hzbMipViews[0], 1); if (m_pointClampSampler) { hzbInit->setSamplerState(m_pointClampSampler, 0); } else if (m_linearClampSampler) { hzbInit->setSamplerState(m_linearClampSampler, 0); } const uint32_t hzbWidth = m_hzbMipViews[0]->width(); const uint32_t hzbHeight = m_hzbMipViews[0]->height(); const uint32_t threads = 8; MTL::Size tgSize = MTL::Size(threads, threads, 1); MTL::Size gridSize = MTL::Size((hzbWidth + threads - 1) / threads * threads, (hzbHeight + threads - 1) / threads * threads, 1); hzbInit->dispatchThreads(gridSize, tgSize); hzbInit->endEncoding(); for (size_t mip = 1; mip < m_hzbMipViews.size(); ++mip) { MTL::Texture* src = m_hzbMipViews[mip - 1]; MTL::Texture* dst = m_hzbMipViews[mip]; if (!src || !dst) { continue; } MTL::ComputeCommandEncoder* downEncoder = commandBuffer->computeCommandEncoder(); downEncoder->setComputePipelineState(m_hzbDownsamplePipeline); downEncoder->setTexture(src, 0); downEncoder->setTexture(dst, 1); const uint32_t mipWidth = dst->width(); const uint32_t mipHeight = dst->height(); MTL::Size downGrid = MTL::Size((mipWidth + threads - 1) / threads * threads, (mipHeight + threads - 1) / threads * threads, 1); downEncoder->dispatchThreads(downGrid, tgSize); downEncoder->endEncoding(); } if (m_instanceCullHzbPipeline) { dispatchInstanceCulling(m_instanceCullHzbPipeline, true); } } My Questions: Encoder Synchronization: Would you recommend moving this loop into a single ComputeCommandEncoder using MTLBarrier between dispatches to maintain L2 cache residency, or is the overhead of separate encoders negligible for depth-downsampling on TBDR? visionOS Bindless Latency: For stereo rendering on visionOS, what are the best practices for managing MTL4ArgumentTable updates at 90Hz+? I want to ensure that updating bindless resources for each eye doesn't introduce unnecessary CPU-to-GPU latency. Memory Management: Are there specific hints for Memoryless textures that could be applied to intermediate HZB levels to save bandwidth during this process? I’ve attached a screenshot of a scene rendered with the engine (PBR, SSR, and IBL).

Graphics & Games Metal Graphics and Games Metal MetalKit metal-cpp

0

430

Feb ’26

Xcode Metal Trace

Code is download from apple official metal4 sample [https://developer.apple.com/documentation/metal/drawing-a-triangle-with-metal-4?language=objc] enable metal gpu trace in macOS schema and trace a frame in Xcode. Xcode may show segment fault on App from some 'GTTrace' function when click trace button. When replay a .gputrace file, Xcode may crash , throw an internal error or a XPC error. The example code using old metal-renderer can trace without any problem and everything works fine. Test Environment: Xcode Version 26.2 (17C52) macOS 26.2 (25C56) M1 Pro 16GB A2442

Graphics & Games Metal Metal Xcode metal-cpp

2

0

505

Jan ’26

Has anyone been able to create a window/portal using metal.

I am trying to create a simple portal like that in RealityKit, but using metal instead of RealityKit. Has anyone been able to create a window or portal like thing to show a skybox outside in mixed Reality?

Graphics & Games Metal

0

218

Jan ’26

Metal 4: Proper usage of requestResidency() with unique per-frame textures at 120fps

Hello, I have some confusion regarding ResidencySet. Specifically, about the requestResidency() function: how often should we call it? I have a captureOutput(_:didOutput:from:) method that is triggered at 60 or 120 fps. Inside this method, I am calling the following code every frame: computeResidencySet.removeAllAllocations() сomputeResidencySet.addAllocation(TextureA) computeResidencySet.addAllocation(TextureB) computeResidencySet.addAllocation(TextureC) computeResidencySet.commit() computeResidencySet.requestResidency() // Should we call it every frame? Please keep in mind that TextureA, TextureB, and TextureC are unique for each call (new instances are provided on every frame)."

Graphics & Games Metal Metal AVFoundation

1

0

605

Jan ’26

MetalFX FrameInterpolator assertion: `Color texture width mismatch from descriptor` even when all texture sizes match

I am integrating MetalFX FrameInterpolator into a custom Unity RenderGraph–based render pipeline (C++ native plugin + C# render passes), and I am hitting the following assertion at runtime: /MetalFXDebugError.h:29: failed assertion `Color texture width mismatch from descriptor' What makes this confusing is that all input/output textures have the correct width and height, and they exactly match the values specified in the MTLFXFrameInterpolatorDescriptor. Setup Input resolution: 1024 x 512 Output resolution: 2048 x 1024 MTLFXTemporalScaler is created first and then passed into MTLFXFrameInterpolator The TemporalScaler and FrameInterpolator descriptors use the same input/output sizes and formats All Metal textures: Have no parentTexture Are 2D textures Match the descriptor sizes exactly (verified via logging) Texture bindings at encode time frameInterpolator.colorTexture = mtlTexColor; // 1024 x 512 frameInterpolator.prevColorTexture = mtlTexPrevColor; // 1024 x 512 frameInterpolator.motionTexture = mtlTexMotion; // 1024 x 512 frameInterpolator.depthTexture = mtlTexDepth; // 1024 x 512 frameInterpolator.uiTexture = mtlTexUI; // 2048 x 1024 frameInterpolator.outputTexture = mtlTexOutput; // 2048 x 1024 All widths/heights are logged and match: Color : 1024 x 512 (input) PrevColor : 1024 x 512 (input) Motion : 1024 x 512 (input) Depth : 1024 x 512 (input) UI : 2048 x 1024 (output) Output : 2048 x 1024 (output) The TemporalScaler works correctly on its own. The assertion only occurs when using FrameInterpolator. Important detail about colorTexture Originally, colorTexture was copied from BuiltinRenderTextureType.CurrentActive. After reading that this might violate MetalFX semantics, I changed the pipeline so that: colorTexture now comes from a dedicated private RenderGraph texture It is not the backbuffer It is not a drawable It is not used as a final output It is created before UI rendering Despite this, the assertion still occurs. Question Can uiTexture for MTLFXFrameInterpolator legally come from a texture copied from BuiltinRenderTextureType.CurrentActive? More generally: Are there additional hidden constraints on colorTexture / prevColorTexture (such as Metal usage, storageMode, aliasing, or hazard tracking) that could cause this assertion, even when sizes match? Does FrameInterpolator require colorTexture and prevColorTexture to be created in a very specific way (e.g. non-aliased, ShaderRead usage, identical Metal resource properties)? Any clarification on the exact semantic requirements for colorTexture, prevColorTexture, or uiTexture in MetalFX FrameInterpolator would be greatly appreciated.

Graphics & Games Metal MetalFX

4

0

630

Jan ’26

# [CRITICAL] Metal RHI Memory Leak - Resource exhaustion vulnerability (CWE-400) - Bug Report

[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400) Security Classification This issue constitutes a resource exhaustion vulnerability (CWE-400): Aspect Details Type Uncontrolled Resource Consumption CWE CWE-400 Vector Local (any Metal application) Impact System instability, denial of service User Control None - no mitigation available Recovery Requires application restart Summary Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation. This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level. Environment OS: macOS Tahoe 26.2 Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3) API: Metal Reproduction Steps Run any Metal application that allocates and deallocates GPU buffers via Metal heaps Open Activity Monitor and observe the application's memory usage Let the application run idle (no user interaction required) Observe memory growing continuously at ~1-2 MB per second Memory never plateaus or stabilizes Eventually system becomes unstable For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1) Observed Behavior Memory Analysis Using Unreal's memreport -full command, two reports taken 86 seconds apart: Metric Report 1 (183s) Report 2 (269s) Delta Process Physical 4373.64 MB 4463.39 MB +89.75 MB Metal Heap Buffer 7168 MB 8192 MB +1024 MB Unused Heap 3453 MB 4477 MB +1024 MB Object Count 73,840 73,840 0 (no change) Key Finding Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates: Metal is allocating new heap blocks in ~1 GB increments Previously allocated heap memory becomes "unused" but is never released The unused memory accumulates indefinitely No application-level objects are leaking (count remains constant) Memory Growth Pattern Continuous growth while idle (no user interaction) Growth rate: approximately 1-2 MB per second No plateau or stabilization occurs Metal allocates new 1 GB heap blocks rather than reusing freed space Eventually leads to system instability and crash What is NOT Causing This We verified the following are NOT the source: Application objects - Object count remains constant Application code - Blank project with no code reproduces the issue Texture streaming - Disabling texture streaming had no effect CPU garbage collection - Running GC has no effect (this is GPU memory) Mitigations Attempted (None Worked) setPurgeableState Setting resources to purgeable state before release: [buffer setPurgeableState:MTLPurgeableStateEmpty]; Result: Metal ignores this hint and does not reclaim heap memory. Avoiding Heap Pooling Forcing individual buffer allocations instead of heap-based pooling. Result: Leak persists - Metal still manages underlying allocations. Aggressive Buffer Compaction Attempting to compact/defragment buffers within heaps every frame. Result: Only moves data between existing heaps. Does NOT release heaps back to OS. Reducing Pool Sizes Minimizing all buffer pool sizes to force more frequent reuse. Result: Slightly slows the leak rate but does not stop it. Root Cause Analysis How Metal Heap Allocation Appears to Work Metal allocates GPU heap blocks in large chunks (~1 GB observed) Application requests buffers from these heaps When application releases buffers, memory becomes "unused" within the heap Metal does NOT release heap blocks back to macOS, even when entirely unused When fragmentation prevents reuse, Metal allocates new heap blocks Result: Continuous memory growth with no upper bound The Core Problem There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application. Expected Behavior Metal should: Release unused heaps - When a heap block is entirely unused, release it back to macOS Respect purgeable hints - Honor setPurgeableState calls from applications Compact allocations - Defragment heap allocations to reduce fragmentation Provide control APIs - Allow applications to request heap compaction or release Enforce limits - Have configurable maximum heap memory consumption Security Implications Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance No Upper Bound - Memory consumption continues until system failure Unmitigable - End users have no way to prevent or limit the leak Affects All Metal Apps - Any application using Metal heaps is potentially affected Impact Applications become unstable after extended use System-wide performance degrades as memory pressure increases Users must periodically restart applications Developers cannot work around this at the application level Long-running applications (games, creative tools, servers) are particularly affected Request Investigate Metal heap memory management behavior Implement heap release when blocks become entirely unused Honor setPurgeableState hints from applications Consider providing an API for applications to request heap compaction Document any intended behavior or workarounds Additional Notes This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible. The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level. Tested: January 2026 Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)

Graphics & Games Metal Metal Metal Performance Shaders metal-cpp

5

2

1.1k

Jan ’26

Post

Replies

Boosts

Views

Activity