Hi everyone,
I’ve been developing a custom, end-to-end 3D rendering engine called Crescent from scratch using C++20 and Metal-cpp (targeting macOS and visionOS). My primary goal is to build a zero-bottleneck, GPU-driven pipeline that maximizes the potential of Apple Silicon’s Unified Memory and TBDR architecture.
While the fundamental systems are stable, I am looking for architectural feedback from Metal framework engineers regarding specific synchronization and latency challenges.
Current Core Implementations:
GPU-Driven Instance Culling: High-performance occlusion culling using a Hierarchical Z-Buffer (HZB) approach via Compute Shaders.
Clustered Forward Shading: Support for high-count dynamic lights through view-space clustering.
Temporal Stability: Custom TAA with history rejection and Motion Blur resolve.
Asset Infrastructure: Robust GUID-based scene serialization and a JSON-driven ECS hierarchy.
The Architectural Challenge:
I am currently seeing slight synchronization overhead when generating the HZB mip-chain. On Apple Silicon, I am evaluating the cost of encoder transitions versus cache-friendly barriers.
&& m_hzbInitPipeline && m_hzbDownsamplePipeline && !m_hzbMipViews.empty();
if (canBuildHzb) {
MTL::ComputeCommandEncoder* hzbInit = commandBuffer->computeCommandEncoder();
hzbInit->setComputePipelineState(m_hzbInitPipeline);
hzbInit->setTexture(m_depthTexture, 0);
hzbInit->setTexture(m_hzbMipViews[0], 1);
if (m_pointClampSampler) {
hzbInit->setSamplerState(m_pointClampSampler, 0);
} else if (m_linearClampSampler) {
hzbInit->setSamplerState(m_linearClampSampler, 0);
}
const uint32_t hzbWidth = m_hzbMipViews[0]->width();
const uint32_t hzbHeight = m_hzbMipViews[0]->height();
const uint32_t threads = 8;
MTL::Size tgSize = MTL::Size(threads, threads, 1);
MTL::Size gridSize = MTL::Size((hzbWidth + threads - 1) / threads * threads,
(hzbHeight + threads - 1) / threads * threads,
1);
hzbInit->dispatchThreads(gridSize, tgSize);
hzbInit->endEncoding();
for (size_t mip = 1; mip < m_hzbMipViews.size(); ++mip) {
MTL::Texture* src = m_hzbMipViews[mip - 1];
MTL::Texture* dst = m_hzbMipViews[mip];
if (!src || !dst) {
continue;
}
MTL::ComputeCommandEncoder* downEncoder = commandBuffer->computeCommandEncoder();
downEncoder->setComputePipelineState(m_hzbDownsamplePipeline);
downEncoder->setTexture(src, 0);
downEncoder->setTexture(dst, 1);
const uint32_t mipWidth = dst->width();
const uint32_t mipHeight = dst->height();
MTL::Size downGrid = MTL::Size((mipWidth + threads - 1) / threads * threads,
(mipHeight + threads - 1) / threads * threads,
1);
downEncoder->dispatchThreads(downGrid, tgSize);
downEncoder->endEncoding();
}
if (m_instanceCullHzbPipeline) {
dispatchInstanceCulling(m_instanceCullHzbPipeline, true);
}
}
My Questions:
Encoder Synchronization: Would you recommend moving this loop into a single ComputeCommandEncoder using MTLBarrier between dispatches to maintain L2 cache residency, or is the overhead of separate encoders negligible for depth-downsampling on TBDR?
visionOS Bindless Latency: For stereo rendering on visionOS, what are the best practices for managing MTL4ArgumentTable updates at 90Hz+? I want to ensure that updating bindless resources for each eye doesn't introduce unnecessary CPU-to-GPU latency.
Memory Management: Are there specific hints for Memoryless textures that could be applied to intermediate HZB levels to save bandwidth during this process?
I’ve attached a screenshot of a scene rendered with the engine (PBR, SSR, and IBL).
Delve into the world of graphics and game development. Discuss creating stunning visuals, optimizing game mechanics, and share resources for game developers.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Created
Hi!
I hope everyone reading is doing well. I am working on developing a reinforcement learning agent that involves sending scan codes to a window, which I've been doing by sending virtual scan codes with CGEventCreateKeyboardEvent per the docs. There is no event source when I send the keyboard events.
However, when many keyboard events are happening (with the keys 'q', 'w', 'e', 'r', 'f', 'd', 's', space, arrow keys) in quick succession (<250ms), the enable dictation popup or the function button emojis popup appear for seemingly no reason. I have verified that I am using the correct scan codes for these keypresses, so I am wondering what else could cause this to happen. It is as if I am choosing to press f5 or fn. It does not happen when 'a' is the only button being pressed in quick succession.
One thing that I have not been able to easily find is the scan code inputs for dictation nor the function button. do these scan codes overlap somehow?
Thank you all for the help!
Hunter
I recently published my first game on the App Store. It uses SceneKit with a SpriteKit overlay. All crashes Xcode downloaded for it so far are related to some SpriteKit/SceneKit internals.
The most common crash is caused by SKCShapeNode::_NEW_copyRenderPathData. What could cause such a crash?
crash.crash
While developing this game (and the BoardGameKit framework that appears in the crash log) over the years I experienced many crashes presumably caused by the SpriteKit overlay (I opened a post SceneKit app randomly crashes with EXC_BAD_ACCESS in jet_context::set_fragment_texture about such a crash in September 2024), and other people on the internet also mention that they experience crashes when using SpriteKit as a SceneKit overlay. Should I use a separate SKView and lay it on top of SCNView rather than setting SCNView.overlaySKScene? That seemed to solve the crashes for a guy on stackoverflow, but is it also encouraged by Apple?
I know SceneKit is deprecated, but according to Apple critical bugs would still be fixed. Could this be considered a critical bug?
When trying to play with friends Krazy Krownz doesn’t allow me to click multiplayer even though my Apple Game Center connected and my friends Apple game center connected as well. I even tried sending an invite from Apple Game Center to friends and Krazy Krownz doesn’t even show up on the list of available multiplayer games.
I’ve signed out and back in the same issue remain.
I’ve try to contact the game developer, but the website doesn’t work.
Topic:
Graphics & Games
SubTopic:
GameKit
Code is download from apple official metal4 sample
[https://developer.apple.com/documentation/metal/drawing-a-triangle-with-metal-4?language=objc]
enable metal gpu trace in macOS schema and trace a frame in Xcode.
Xcode may show segment fault on App from some 'GTTrace' function when click trace button.
When replay a .gputrace file, Xcode may crash , throw an internal error or a XPC error.
The example code using old metal-renderer can trace without any problem and everything works fine.
Test Environment:
Xcode Version 26.2 (17C52)
macOS 26.2 (25C56)
M1 Pro 16GB A2442
I am trying to create a simple portal like that in RealityKit, but using metal instead of RealityKit. Has anyone been able to create a window or portal like thing to show a skybox outside in mixed Reality?
Topic:
Graphics & Games
SubTopic:
Metal
In my turn-based game, I receive GKListener event receivedTurnEventForMatch and decode the match.matchData. On occasion, the matchData is clearly stale and is from the previous turn. If I call the MatchMaker ViewController up and select that same match, the data is not stale, so it's not a matter of not calling endTurn.
I have tried both loadMatchWithID and loadMatchesWithCompletionHandler after receiving the receivedTurnEventForMatch, but the data is still stale.
Advice?
Topic:
Graphics & Games
SubTopic:
GameKit
I have an odd bug, if I use initWithFrame as the init routine for NSView subclass that uses layers I don't see this bug.
But if I embedded this view into a storyboard with a .nib file and use initWithCoder, I need to return true on
(BOOL) contentsAreFlipped
From the NSView subclass
If I don't the CALayer actually renders from 0,0 from the view upwards and off the window.
The frame sizes for the NSView and the CALayer are good.. when I see them in updateLayer.
Obviously I have a fix.. but I would like to understand why.
Topic:
Graphics & Games
SubTopic:
General
We are developing a standalone AI avatar application for hospital reception kiosks using Mac mini (M2/M4). The app runs on SwiftUI + RealityKit, displays on a 75-inch monitor, and utilizes a USB-connected 4K camera and external sensors (LiDAR/mmWave).
We have several technical concerns regarding the transition from iPadOS to macOS. Could you please provide insights on the following?
ARKit/Vision Framework on macOS with External Camera On iPadOS, ARKit provides robust Face Tracking. On macOS with an external USB 4K camera:
Can we achieve real-time face tracking (expression/gaze/depth) with Vision framework or ARKit comparable to iPadOS performance?
Are there any specific limitations for accessing the Neural Engine via Vision framework for real-time 4K video analysis on macOS?
Accessing External Hardware (LiDAR/Sensors) in Sandbox We plan to connect external LiDAR and mmWave sensors (e.g., Akara) via USB/Bluetooth.
Is it feasible to communicate with these custom drivers/devices within the App Sandbox environment?
Would DriverKit be required, or can we use standard serial communication APIs?
On-Device LLM (MLX) & Thermals We intend to run a local LLM (e.g., Llama 3 using MLX framework) for offline conversation, alongside 3D rendering.
With the M2/M4 Mac mini fan design, is there a risk of thermal throttling during 10+ hours of continuous operation (simultaneous CoreML + 3D rendering)?
Is the Mac Studio recommended over the Mac mini for this thermal profile?
Long-running Speech API
Are there any known issues (memory leaks, API limits) when using Spherch framework and AVSpeechSynthesizer continuously for over 10 hours daily?
3D Display Output
Are there any macOS constraints for rendering a SwiftUI window in a specific 3D format (e.g., Side-by-Side) and outputting it via HDMI to a 3D digital signage display (fixed refresh rate/resolution)?
Thank you for your assistance.
Topic:
Graphics & Games
SubTopic:
RealityKit
I am integrating MetalFX FrameInterpolator into a custom Unity RenderGraph–based render pipeline (C++ native plugin + C# render passes), and I am hitting the following assertion at runtime:
/MetalFXDebugError.h:29: failed assertion `Color texture width mismatch from descriptor'
What makes this confusing is that all input/output textures have the correct width and height, and they exactly match the values specified in the MTLFXFrameInterpolatorDescriptor.
Setup
Input resolution: 1024 x 512
Output resolution: 2048 x 1024
MTLFXTemporalScaler is created first and then passed into MTLFXFrameInterpolator
The TemporalScaler and FrameInterpolator descriptors use the same input/output sizes and formats
All Metal textures:
Have no parentTexture
Are 2D textures
Match the descriptor sizes exactly (verified via logging)
Texture bindings at encode time
frameInterpolator.colorTexture = mtlTexColor; // 1024 x 512
frameInterpolator.prevColorTexture = mtlTexPrevColor; // 1024 x 512
frameInterpolator.motionTexture = mtlTexMotion; // 1024 x 512
frameInterpolator.depthTexture = mtlTexDepth; // 1024 x 512
frameInterpolator.uiTexture = mtlTexUI; // 2048 x 1024
frameInterpolator.outputTexture = mtlTexOutput; // 2048 x 1024
All widths/heights are logged and match:
Color : 1024 x 512 (input)
PrevColor : 1024 x 512 (input)
Motion : 1024 x 512 (input)
Depth : 1024 x 512 (input)
UI : 2048 x 1024 (output)
Output : 2048 x 1024 (output)
The TemporalScaler works correctly on its own.
The assertion only occurs when using FrameInterpolator.
Important detail about colorTexture
Originally, colorTexture was copied from BuiltinRenderTextureType.CurrentActive.
After reading that this might violate MetalFX semantics, I changed the pipeline so that:
colorTexture now comes from a dedicated private RenderGraph texture
It is not the backbuffer
It is not a drawable
It is not used as a final output
It is created before UI rendering
Despite this, the assertion still occurs.
Question
Can uiTexture for MTLFXFrameInterpolator legally come from a texture copied from BuiltinRenderTextureType.CurrentActive?
More generally:
Are there additional hidden constraints on colorTexture / prevColorTexture (such as Metal usage, storageMode, aliasing, or hazard tracking) that could cause this assertion, even when sizes match?
Does FrameInterpolator require colorTexture and prevColorTexture to be created in a very specific way (e.g. non-aliased, ShaderRead usage, identical Metal resource properties)?
Any clarification on the exact semantic requirements for colorTexture, prevColorTexture, or uiTexture in MetalFX FrameInterpolator would be greatly appreciated.
Hello,
In our game we enforce an age gate before showing Game Center sign‑in. Only after the user passes the age gate do we call GKLocalPlayer.localPlayer.authenticateHandler.
The reason I’m asking is that we want to reliably detect if the game was launched from a Game Center activity in the Games app (iOS 26+). If the user prefers to enter via activities, we don’t want to miss that event during cold start.
Our current proposal is:
Register a GKLocalPlayerListener early in didFinishLaunchingWithOptions: so the app is ready to catch events.
Queue any incoming events in our dispatcher.
Only process those events after the user passes the age gate and authentication succeeds.
My questions are:
Does player:wantsToPlayGameActivity:completionHandler: ever fire before authentication, or only after the local player is authenticated?
If it only fires after authentication, is our “register early but gate processing” approach the correct way to ensure we don’t miss activity launches?
Is there any recommended pattern to distinguish “activity launch” vs. “normal launch” in this age‑gate scenario?
We want to respect Apple’s age gate requirements, but also ensure activity launches are not lost if the user prefers that entry point.
Sorry if this is a stupid question — I just want to be sure we’re following the right pattern.
Thanks for any clarification or best‑practice guidance!
Updated my app to include turn-based matches. Beta testing through FlightTest and all was well between iOS 18.x and 26.2 devices. One beta tester upgraded to 26.2 during beta testing and now when the MatchMaker VC is opened, it does not show existing matches. Worse, he can create new matches and play his turn, but the new match won't even show up in MMVC, even after opponent takes turn.
My app has been reviewed and is ready for release, but I'd like to know how to solve this before I release. He has tried re-installing the app, including an updated FlightTest version that is the same as the about-to-be-released reviewed version.
Topic:
Graphics & Games
SubTopic:
GameKit
Leaderboards working fine in iOS 26.1 but seem to be broken in 26.2 and also in the 26.3 developer beta. Players cannot submit scores and neither can they view scores on Apple's default leaderboards. Custom leaderboards that rely on pulling information using GameKit APIs also fail.
Is there a workaround or patch for this?
Hello, I have some confusion regarding ResidencySet. Specifically, about the requestResidency() function: how often should we call it?
I have a captureOutput(_:didOutput:from:) method that is triggered at 60 or 120 fps. Inside this method, I am calling the following code every frame:
computeResidencySet.removeAllAllocations()
сomputeResidencySet.addAllocation(TextureA)
computeResidencySet.addAllocation(TextureB)
computeResidencySet.addAllocation(TextureC)
computeResidencySet.commit()
computeResidencySet.requestResidency() // Should we call it every frame?
Please keep in mind that TextureA, TextureB, and TextureC are unique for each call (new instances are provided on every frame)."
hello apple through this message i want to draw you attention to some problems with gptk and rosetta some games like marvel spiderman 2 have broken animations and t pose issues and other like uncharted and the last of us have severe memory leak issues so its my request please fix it asap
Hi fellow devs, I have a quick question is it possible to have virtual controllers on Mac. For instance can my app exclusively manage the controller and output it into the Game Controller framework? And create a virtual controller to allow for features such as controller emulation, haptic control, and others.
Hi!
I'd like to share a technical sample app, SKRenderer Demo.
This app demonstrates:
Setting up SKRenderer
Recording SpriteKit scenes to image sequences
Recording SpriteKit scenes to video using IOSurface and AVFoundation
Applying Core Image filters
Exploring SpriteKit's simulation timing and physics determinism
Use Case
Record SpriteKit simulations as video or images for sharing and creating content.
I explored several approaches, including the excellent view.texture(from:crop:) for live recording from SKView. The SKRenderer approach assumes recording happens asynchronously: you capture user interactions as commands during live interaction, then replay those commands through an offline render pass to generate the final output.
I hope this helps others working on replay systems, simulation capture, or SpriteKit projects in general!
I am developing a custom app for Apple Vision Pro using Compositor Services to stream content from NVIDIA Omniverse. The app is based on:
https://github.com/NVIDIA-Omniverse/apple-configurator-sample
Environment:
Device: Apple Vision Pro
OS Version: visionOS 26.2
Xcode Version: 26.2
The Issue: The application crashes hard (__abort_with_payload) in "libsystem_kernel.dylib" on Task 6 immediately after initialization. This appears to be a deliberate abort triggered by the compositor, not a typical crash.
The issue occurs on both physical device and simulator.
Important detail:
The console output shows a specific CLIENT BUG assertion. By checking the metadata of the warning, I found that it is related to "Library: CompositorNonUI".
Relevant console output before abort:
Missed 'FrameLimiter' target of 90.0 Hz
running compositor services to get IPD, FOV, etc
fence tx observer 14f27 timed out after 0.600000
fence tx observer bc1b timed out after 0.600000
BUG IN CLIENT: For mixed reality experiences please use cp_drawable_compute_projection API
I noticed that when the render command encoder adds no draw calls an apps memory usage seems to grow unboundedly. Using a super simple MTKView-based drawing with the following delegate (code at end).
If I add the simplest of draw calls, e.g., a single vertex, the app's memory usage is normal, around 100-ish MBs.
I am attaching a couple screenshot, one from Xcode and one from Instruments.
What's going on here? Is this an illegal program? If yes, why does it not crash, such as if the encode or command buffer weren't ended.
Or is there some race condition at play here due to the lack of draws?
class Renderer: NSObject, MTKViewDelegate {
var device: MTLDevice
var commandQueue: MTL4CommandQueue
var commandBuffer: MTL4CommandBuffer
var allocator: MTL4CommandAllocator
override init() {
guard let d = MTLCreateSystemDefaultDevice(),
let queue = d.makeMTL4CommandQueue(),
let cmdBuffer = d.makeCommandBuffer(),
let alloc = d.makeCommandAllocator()
else {
fatalError("unable to create metal 4 objects")
}
self.device = d
self.commandQueue = queue
self.commandBuffer = cmdBuffer
self.allocator = alloc
super.init()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable else { return }
commandBuffer.beginCommandBuffer(allocator: allocator)
guard let descriptor = view.currentMTL4RenderPassDescriptor,
let encoder = commandBuffer.makeRenderCommandEncoder(
descriptor: descriptor
)
else {
fatalError("unable to create encoder")
}
encoder.endEncoding()
commandBuffer.endCommandBuffer()
commandQueue.waitForDrawable(drawable)
commandQueue.commit([commandBuffer])
commandQueue.signalDrawable(drawable)
drawable.present()
}
}
Topic:
Graphics & Games
SubTopic:
Metal
[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400)
Security Classification
This issue constitutes a resource exhaustion vulnerability (CWE-400):
Aspect
Details
Type
Uncontrolled Resource Consumption
CWE
CWE-400
Vector
Local (any Metal application)
Impact
System instability, denial of service
User Control
None - no mitigation available
Recovery
Requires application restart
Summary
Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation.
This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level.
Environment
OS: macOS Tahoe 26.2
Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3)
API: Metal
Reproduction Steps
Run any Metal application that allocates and deallocates GPU buffers via Metal heaps
Open Activity Monitor and observe the application's memory usage
Let the application run idle (no user interaction required)
Observe memory growing continuously at ~1-2 MB per second
Memory never plateaus or stabilizes
Eventually system becomes unstable
For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1)
Observed Behavior
Memory Analysis
Using Unreal's memreport -full command, two reports taken 86 seconds apart:
Metric
Report 1 (183s)
Report 2 (269s)
Delta
Process Physical
4373.64 MB
4463.39 MB
+89.75 MB
Metal Heap Buffer
7168 MB
8192 MB
+1024 MB
Unused Heap
3453 MB
4477 MB
+1024 MB
Object Count
73,840
73,840
0 (no change)
Key Finding
Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates:
Metal is allocating new heap blocks in ~1 GB increments
Previously allocated heap memory becomes "unused" but is never released
The unused memory accumulates indefinitely
No application-level objects are leaking (count remains constant)
Memory Growth Pattern
Continuous growth while idle (no user interaction)
Growth rate: approximately 1-2 MB per second
No plateau or stabilization occurs
Metal allocates new 1 GB heap blocks rather than reusing freed space
Eventually leads to system instability and crash
What is NOT Causing This
We verified the following are NOT the source:
Application objects - Object count remains constant
Application code - Blank project with no code reproduces the issue
Texture streaming - Disabling texture streaming had no effect
CPU garbage collection - Running GC has no effect (this is GPU memory)
Mitigations Attempted (None Worked)
setPurgeableState
Setting resources to purgeable state before release:
[buffer setPurgeableState:MTLPurgeableStateEmpty];
Result: Metal ignores this hint and does not reclaim heap memory.
Avoiding Heap Pooling
Forcing individual buffer allocations instead of heap-based pooling.
Result: Leak persists - Metal still manages underlying allocations.
Aggressive Buffer Compaction
Attempting to compact/defragment buffers within heaps every frame.
Result: Only moves data between existing heaps. Does NOT release heaps back to OS.
Reducing Pool Sizes
Minimizing all buffer pool sizes to force more frequent reuse.
Result: Slightly slows the leak rate but does not stop it.
Root Cause Analysis
How Metal Heap Allocation Appears to Work
Metal allocates GPU heap blocks in large chunks (~1 GB observed)
Application requests buffers from these heaps
When application releases buffers, memory becomes "unused" within the heap
Metal does NOT release heap blocks back to macOS, even when entirely unused
When fragmentation prevents reuse, Metal allocates new heap blocks
Result: Continuous memory growth with no upper bound
The Core Problem
There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application.
Expected Behavior
Metal should:
Release unused heaps - When a heap block is entirely unused, release it back to macOS
Respect purgeable hints - Honor setPurgeableState calls from applications
Compact allocations - Defragment heap allocations to reduce fragmentation
Provide control APIs - Allow applications to request heap compaction or release
Enforce limits - Have configurable maximum heap memory consumption
Security Implications
Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications
Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance
No Upper Bound - Memory consumption continues until system failure
Unmitigable - End users have no way to prevent or limit the leak
Affects All Metal Apps - Any application using Metal heaps is potentially affected
Impact
Applications become unstable after extended use
System-wide performance degrades as memory pressure increases
Users must periodically restart applications
Developers cannot work around this at the application level
Long-running applications (games, creative tools, servers) are particularly affected
Request
Investigate Metal heap memory management behavior
Implement heap release when blocks become entirely unused
Honor setPurgeableState hints from applications
Consider providing an API for applications to request heap compaction
Document any intended behavior or workarounds
Additional Notes
This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible.
The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level.
Tested: January 2026
Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)