Delve into the world of graphics and game development. Discuss creating stunning visuals, optimizing game mechanics, and share resources for game developers.

All subtopics
Posts under Graphics & Games topic

Post

Replies

Boosts

Views

Created

Optimizing HZB Mip-Chain Generation and Bindless Argument Tables in a Custom Metal Engine
Hi everyone, I’ve been developing a custom, end-to-end 3D rendering engine called Crescent from scratch using C++20 and Metal-cpp (targeting macOS and visionOS). My primary goal is to build a zero-bottleneck, GPU-driven pipeline that maximizes the potential of Apple Silicon’s Unified Memory and TBDR architecture. While the fundamental systems are stable, I am looking for architectural feedback from Metal framework engineers regarding specific synchronization and latency challenges. Current Core Implementations: GPU-Driven Instance Culling: High-performance occlusion culling using a Hierarchical Z-Buffer (HZB) approach via Compute Shaders. Clustered Forward Shading: Support for high-count dynamic lights through view-space clustering. Temporal Stability: Custom TAA with history rejection and Motion Blur resolve. Asset Infrastructure: Robust GUID-based scene serialization and a JSON-driven ECS hierarchy. The Architectural Challenge: I am currently seeing slight synchronization overhead when generating the HZB mip-chain. On Apple Silicon, I am evaluating the cost of encoder transitions versus cache-friendly barriers. && m_hzbInitPipeline && m_hzbDownsamplePipeline && !m_hzbMipViews.empty(); if (canBuildHzb) { MTL::ComputeCommandEncoder* hzbInit = commandBuffer->computeCommandEncoder(); hzbInit->setComputePipelineState(m_hzbInitPipeline); hzbInit->setTexture(m_depthTexture, 0); hzbInit->setTexture(m_hzbMipViews[0], 1); if (m_pointClampSampler) { hzbInit->setSamplerState(m_pointClampSampler, 0); } else if (m_linearClampSampler) { hzbInit->setSamplerState(m_linearClampSampler, 0); } const uint32_t hzbWidth = m_hzbMipViews[0]->width(); const uint32_t hzbHeight = m_hzbMipViews[0]->height(); const uint32_t threads = 8; MTL::Size tgSize = MTL::Size(threads, threads, 1); MTL::Size gridSize = MTL::Size((hzbWidth + threads - 1) / threads * threads, (hzbHeight + threads - 1) / threads * threads, 1); hzbInit->dispatchThreads(gridSize, tgSize); hzbInit->endEncoding(); for (size_t mip = 1; mip < m_hzbMipViews.size(); ++mip) { MTL::Texture* src = m_hzbMipViews[mip - 1]; MTL::Texture* dst = m_hzbMipViews[mip]; if (!src || !dst) { continue; } MTL::ComputeCommandEncoder* downEncoder = commandBuffer->computeCommandEncoder(); downEncoder->setComputePipelineState(m_hzbDownsamplePipeline); downEncoder->setTexture(src, 0); downEncoder->setTexture(dst, 1); const uint32_t mipWidth = dst->width(); const uint32_t mipHeight = dst->height(); MTL::Size downGrid = MTL::Size((mipWidth + threads - 1) / threads * threads, (mipHeight + threads - 1) / threads * threads, 1); downEncoder->dispatchThreads(downGrid, tgSize); downEncoder->endEncoding(); } if (m_instanceCullHzbPipeline) { dispatchInstanceCulling(m_instanceCullHzbPipeline, true); } } My Questions: Encoder Synchronization: Would you recommend moving this loop into a single ComputeCommandEncoder using MTLBarrier between dispatches to maintain L2 cache residency, or is the overhead of separate encoders negligible for depth-downsampling on TBDR? visionOS Bindless Latency: For stereo rendering on visionOS, what are the best practices for managing MTL4ArgumentTable updates at 90Hz+? I want to ensure that updating bindless resources for each eye doesn't introduce unnecessary CPU-to-GPU latency. Memory Management: Are there specific hints for Memoryless textures that could be applied to intermediate HZB levels to save bandwidth during this process? I’ve attached a screenshot of a scene rendered with the engine (PBR, SSR, and IBL).
0
0
89
14h
Many CGECreateKeyboardEvent in quick succession causing function and dictation buttons to be pressed?
Hi! I hope everyone reading is doing well. I am working on developing a reinforcement learning agent that involves sending scan codes to a window, which I've been doing by sending virtual scan codes with CGEventCreateKeyboardEvent per the docs. There is no event source when I send the keyboard events. However, when many keyboard events are happening (with the keys 'q', 'w', 'e', 'r', 'f', 'd', 's', space, arrow keys) in quick succession (<250ms), the enable dictation popup or the function button emojis popup appear for seemingly no reason. I have verified that I am using the correct scan codes for these keypresses, so I am wondering what else could cause this to happen. It is as if I am choosing to press f5 or fn. It does not happen when 'a' is the only button being pressed in quick succession. One thing that I have not been able to easily find is the scan code inputs for dictation nor the function button. do these scan codes overlap somehow? Thank you all for the help! Hunter
2
0
370
3d
SpriteKit scene used as SCNView.overlaySKScene crashes due to SKShapeNode
I recently published my first game on the App Store. It uses SceneKit with a SpriteKit overlay. All crashes Xcode downloaded for it so far are related to some SpriteKit/SceneKit internals. The most common crash is caused by SKCShapeNode::_NEW_copyRenderPathData. What could cause such a crash? crash.crash While developing this game (and the BoardGameKit framework that appears in the crash log) over the years I experienced many crashes presumably caused by the SpriteKit overlay (I opened a post SceneKit app randomly crashes with EXC_BAD_ACCESS in jet_context::set_fragment_texture about such a crash in September 2024), and other people on the internet also mention that they experience crashes when using SpriteKit as a SceneKit overlay. Should I use a separate SKView and lay it on top of SCNView rather than setting SCNView.overlaySKScene? That seemed to solve the crashes for a guy on stackoverflow, but is it also encouraged by Apple? I know SceneKit is deprecated, but according to Apple critical bugs would still be fixed. Could this be considered a critical bug?
4
0
516
4d
Krazy Krownz multiplayer option not working for Game Center!
When trying to play with friends Krazy Krownz doesn’t allow me to click multiplayer even though my Apple Game Center connected and my friends Apple game center connected as well. I even tried sending an invite from Apple Game Center to friends and Krazy Krownz doesn’t even show up on the list of available multiplayer games. I’ve signed out and back in the same issue remain. I’ve try to contact the game developer, but the website doesn’t work.
1
0
215
5d
Xcode Metal Trace
Code is download from apple official metal4 sample [https://developer.apple.com/documentation/metal/drawing-a-triangle-with-metal-4?language=objc] enable metal gpu trace in macOS schema and trace a frame in Xcode. Xcode may show segment fault on App from some 'GTTrace' function when click trace button. When replay a .gputrace file, Xcode may crash , throw an internal error or a XPC error. The example code using old metal-renderer can trace without any problem and everything works fine. Test Environment: Xcode Version 26.2 (17C52) macOS 26.2 (25C56) M1 Pro 16GB A2442
2
0
402
5d
receivedTurnEventForMatch giving stale data
In my turn-based game, I receive GKListener event receivedTurnEventForMatch and decode the match.matchData. On occasion, the matchData is clearly stale and is from the previous turn. If I call the MatchMaker ViewController up and select that same match, the data is not stale, so it's not a matter of not calling endTurn. I have tried both loadMatchWithID and loadMatchesWithCompletionHandler after receiving the receivedTurnEventForMatch, but the data is still stale. Advice?
2
0
460
6d
- (BOOL) contentsAreFlipped needs to be true for .nib layouts
I have an odd bug, if I use initWithFrame as the init routine for NSView subclass that uses layers I don't see this bug. But if I embedded this view into a storyboard with a .nib file and use initWithCoder, I need to return true on (BOOL) contentsAreFlipped From the NSView subclass If I don't the CALayer actually renders from 0,0 from the view upwards and off the window. The frame sizes for the NSView and the CALayer are good.. when I see them in updateLayer. Obviously I have a fix.. but I would like to understand why.
0
0
203
6d
macOS SwiftUI app with external 4K camera & sensors for Hospital Avatar: ARKit, MLX, and Thermal feasibility?
We are developing a standalone AI avatar application for hospital reception kiosks using Mac mini (M2/M4). The app runs on SwiftUI + RealityKit, displays on a 75-inch monitor, and utilizes a USB-connected 4K camera and external sensors (LiDAR/mmWave). We have several technical concerns regarding the transition from iPadOS to macOS. Could you please provide insights on the following? ARKit/Vision Framework on macOS with External Camera On iPadOS, ARKit provides robust Face Tracking. On macOS with an external USB 4K camera: Can we achieve real-time face tracking (expression/gaze/depth) with Vision framework or ARKit comparable to iPadOS performance? Are there any specific limitations for accessing the Neural Engine via Vision framework for real-time 4K video analysis on macOS? Accessing External Hardware (LiDAR/Sensors) in Sandbox We plan to connect external LiDAR and mmWave sensors (e.g., Akara) via USB/Bluetooth. Is it feasible to communicate with these custom drivers/devices within the App Sandbox environment? Would DriverKit be required, or can we use standard serial communication APIs? On-Device LLM (MLX) & Thermals We intend to run a local LLM (e.g., Llama 3 using MLX framework) for offline conversation, alongside 3D rendering. With the M2/M4 Mac mini fan design, is there a risk of thermal throttling during 10+ hours of continuous operation (simultaneous CoreML + 3D rendering)? Is the Mac Studio recommended over the Mac mini for this thermal profile? Long-running Speech API Are there any known issues (memory leaks, API limits) when using Spherch framework and AVSpeechSynthesizer continuously for over 10 hours daily? 3D Display Output Are there any macOS constraints for rendering a SwiftUI window in a specific 3D format (e.g., Side-by-Side) and outputting it via HDMI to a 3D digital signage display (fixed refresh rate/resolution)? Thank you for your assistance.
0
0
215
1w
MetalFX FrameInterpolator assertion: `Color texture width mismatch from descriptor` even when all texture sizes match
I am integrating MetalFX FrameInterpolator into a custom Unity RenderGraph–based render pipeline (C++ native plugin + C# render passes), and I am hitting the following assertion at runtime: /MetalFXDebugError.h:29: failed assertion `Color texture width mismatch from descriptor' What makes this confusing is that all input/output textures have the correct width and height, and they exactly match the values specified in the MTLFXFrameInterpolatorDescriptor. Setup Input resolution: 1024 x 512 Output resolution: 2048 x 1024 MTLFXTemporalScaler is created first and then passed into MTLFXFrameInterpolator The TemporalScaler and FrameInterpolator descriptors use the same input/output sizes and formats All Metal textures: Have no parentTexture Are 2D textures Match the descriptor sizes exactly (verified via logging) Texture bindings at encode time frameInterpolator.colorTexture = mtlTexColor; // 1024 x 512 frameInterpolator.prevColorTexture = mtlTexPrevColor; // 1024 x 512 frameInterpolator.motionTexture = mtlTexMotion; // 1024 x 512 frameInterpolator.depthTexture = mtlTexDepth; // 1024 x 512 frameInterpolator.uiTexture = mtlTexUI; // 2048 x 1024 frameInterpolator.outputTexture = mtlTexOutput; // 2048 x 1024 All widths/heights are logged and match: Color : 1024 x 512 (input) PrevColor : 1024 x 512 (input) Motion : 1024 x 512 (input) Depth : 1024 x 512 (input) UI : 2048 x 1024 (output) Output : 2048 x 1024 (output) The TemporalScaler works correctly on its own. The assertion only occurs when using FrameInterpolator. Important detail about colorTexture Originally, colorTexture was copied from BuiltinRenderTextureType.CurrentActive. After reading that this might violate MetalFX semantics, I changed the pipeline so that: colorTexture now comes from a dedicated private RenderGraph texture It is not the backbuffer It is not a drawable It is not used as a final output It is created before UI rendering Despite this, the assertion still occurs. Question Can uiTexture for MTLFXFrameInterpolator legally come from a texture copied from BuiltinRenderTextureType.CurrentActive? More generally: Are there additional hidden constraints on colorTexture / prevColorTexture (such as Metal usage, storageMode, aliasing, or hazard tracking) that could cause this assertion, even when sizes match? Does FrameInterpolator require colorTexture and prevColorTexture to be created in a very specific way (e.g. non-aliased, ShaderRead usage, identical Metal resource properties)? Any clarification on the exact semantic requirements for colorTexture, prevColorTexture, or uiTexture in MetalFX FrameInterpolator would be greatly appreciated.
4
0
523
1w
Clarifying when Game Center activity events fire relative to authentication
Hello, In our game we enforce an age gate before showing Game Center sign‑in. Only after the user passes the age gate do we call GKLocalPlayer.localPlayer.authenticateHandler. The reason I’m asking is that we want to reliably detect if the game was launched from a Game Center activity in the Games app (iOS 26+). If the user prefers to enter via activities, we don’t want to miss that event during cold start. Our current proposal is: Register a GKLocalPlayerListener early in didFinishLaunchingWithOptions: so the app is ready to catch events. Queue any incoming events in our dispatcher. Only process those events after the user passes the age gate and authentication succeeds. My questions are: Does player:wantsToPlayGameActivity:completionHandler: ever fire before authentication, or only after the local player is authenticated? If it only fires after authentication, is our “register early but gate processing” approach the correct way to ensure we don’t miss activity launches? Is there any recommended pattern to distinguish “activity launch” vs. “normal launch” in this age‑gate scenario? We want to respect Apple’s age gate requirements, but also ensure activity launches are not lost if the user prefers that entry point. Sorry if this is a stupid question — I just want to be sure we’re following the right pattern. Thanks for any clarification or best‑practice guidance!
1
0
207
1w
MatchMaker VC not showing existing matches after upgrade.
Updated my app to include turn-based matches. Beta testing through FlightTest and all was well between iOS 18.x and 26.2 devices. One beta tester upgraded to 26.2 during beta testing and now when the MatchMaker VC is opened, it does not show existing matches. Worse, he can create new matches and play his turn, but the new match won't even show up in MMVC, even after opponent takes turn. My app has been reviewed and is ready for release, but I'd like to know how to solve this before I release. He has tried re-installing the app, including an updated FlightTest version that is the same as the about-to-be-released reviewed version.
4
0
465
1w
GameKit Leaderboards not working in iOS 26.2
Leaderboards working fine in iOS 26.1 but seem to be broken in 26.2 and also in the 26.3 developer beta. Players cannot submit scores and neither can they view scores on Apple's default leaderboards. Custom leaderboards that rely on pulling information using GameKit APIs also fail. Is there a workaround or patch for this?
2
0
272
1w
Metal 4: Proper usage of requestResidency() with unique per-frame textures at 120fps
Hello, I have some confusion regarding ResidencySet. Specifically, about the requestResidency() function: how often should we call it? I have a captureOutput(_:didOutput:from:) method that is triggered at 60 or 120 fps. Inside this method, I am calling the following code every frame: computeResidencySet.removeAllAllocations() сomputeResidencySet.addAllocation(TextureA) computeResidencySet.addAllocation(TextureB) computeResidencySet.addAllocation(TextureC) computeResidencySet.commit() computeResidencySet.requestResidency() // Should we call it every frame? Please keep in mind that TextureA, TextureB, and TextureC are unique for each call (new instances are provided on every frame)."
1
0
465
1w
SpriteKit Offline Rendering with SKRenderer
Hi! I'd like to share a technical sample app, SKRenderer Demo. This app demonstrates: Setting up SKRenderer Recording SpriteKit scenes to image sequences Recording SpriteKit scenes to video using IOSurface and AVFoundation Applying Core Image filters Exploring SpriteKit's simulation timing and physics determinism Use Case Record SpriteKit simulations as video or images for sharing and creating content. I explored several approaches, including the excellent view.texture(from:crop:) for live recording from SKView. The SKRenderer approach assumes recording happens asynchronously: you capture user interactions as commands during live interaction, then replay those commands through an offline render pass to generate the final output. I hope this helps others working on replay systems, simulation capture, or SpriteKit projects in general!
0
0
119
2w
'__abort_with_payload' from CompositorNonUI on visionOS 26.2 (device + simulator, Omniverse streaming)
I am developing a custom app for Apple Vision Pro using Compositor Services to stream content from NVIDIA Omniverse. The app is based on: https://github.com/NVIDIA-Omniverse/apple-configurator-sample Environment: Device: Apple Vision Pro OS Version: visionOS 26.2 Xcode Version: 26.2 The Issue: The application crashes hard (__abort_with_payload) in "libsystem_kernel.dylib" on Task 6 immediately after initialization. This appears to be a deliberate abort triggered by the compositor, not a typical crash. The issue occurs on both physical device and simulator. Important detail: The console output shows a specific CLIENT BUG assertion. By checking the metadata of the warning, I found that it is related to "Library: CompositorNonUI". Relevant console output before abort: Missed 'FrameLimiter' target of 90.0 Hz running compositor services to get IPD, FOV, etc fence tx observer 14f27 timed out after 0.600000 fence tx observer bc1b timed out after 0.600000 BUG IN CLIENT: For mixed reality experiences please use cp_drawable_compute_projection API
0
0
78
2w
Memory leak when no draw calls issued to encoder
I noticed that when the render command encoder adds no draw calls an apps memory usage seems to grow unboundedly. Using a super simple MTKView-based drawing with the following delegate (code at end). If I add the simplest of draw calls, e.g., a single vertex, the app's memory usage is normal, around 100-ish MBs. I am attaching a couple screenshot, one from Xcode and one from Instruments. What's going on here? Is this an illegal program? If yes, why does it not crash, such as if the encode or command buffer weren't ended. Or is there some race condition at play here due to the lack of draws? class Renderer: NSObject, MTKViewDelegate { var device: MTLDevice var commandQueue: MTL4CommandQueue var commandBuffer: MTL4CommandBuffer var allocator: MTL4CommandAllocator override init() { guard let d = MTLCreateSystemDefaultDevice(), let queue = d.makeMTL4CommandQueue(), let cmdBuffer = d.makeCommandBuffer(), let alloc = d.makeCommandAllocator() else { fatalError("unable to create metal 4 objects") } self.device = d self.commandQueue = queue self.commandBuffer = cmdBuffer self.allocator = alloc super.init() } func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {} func draw(in view: MTKView) { guard let drawable = view.currentDrawable else { return } commandBuffer.beginCommandBuffer(allocator: allocator) guard let descriptor = view.currentMTL4RenderPassDescriptor, let encoder = commandBuffer.makeRenderCommandEncoder( descriptor: descriptor ) else { fatalError("unable to create encoder") } encoder.endEncoding() commandBuffer.endCommandBuffer() commandQueue.waitForDrawable(drawable) commandQueue.commit([commandBuffer]) commandQueue.signalDrawable(drawable) drawable.present() } }
3
0
374
2w
# [CRITICAL] Metal RHI Memory Leak - Resource exhaustion vulnerability (CWE-400) - Bug Report
[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400) Security Classification This issue constitutes a resource exhaustion vulnerability (CWE-400): Aspect Details Type Uncontrolled Resource Consumption CWE CWE-400 Vector Local (any Metal application) Impact System instability, denial of service User Control None - no mitigation available Recovery Requires application restart Summary Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation. This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level. Environment OS: macOS Tahoe 26.2 Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3) API: Metal Reproduction Steps Run any Metal application that allocates and deallocates GPU buffers via Metal heaps Open Activity Monitor and observe the application's memory usage Let the application run idle (no user interaction required) Observe memory growing continuously at ~1-2 MB per second Memory never plateaus or stabilizes Eventually system becomes unstable For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1) Observed Behavior Memory Analysis Using Unreal's memreport -full command, two reports taken 86 seconds apart: Metric Report 1 (183s) Report 2 (269s) Delta Process Physical 4373.64 MB 4463.39 MB +89.75 MB Metal Heap Buffer 7168 MB 8192 MB +1024 MB Unused Heap 3453 MB 4477 MB +1024 MB Object Count 73,840 73,840 0 (no change) Key Finding Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates: Metal is allocating new heap blocks in ~1 GB increments Previously allocated heap memory becomes "unused" but is never released The unused memory accumulates indefinitely No application-level objects are leaking (count remains constant) Memory Growth Pattern Continuous growth while idle (no user interaction) Growth rate: approximately 1-2 MB per second No plateau or stabilization occurs Metal allocates new 1 GB heap blocks rather than reusing freed space Eventually leads to system instability and crash What is NOT Causing This We verified the following are NOT the source: Application objects - Object count remains constant Application code - Blank project with no code reproduces the issue Texture streaming - Disabling texture streaming had no effect CPU garbage collection - Running GC has no effect (this is GPU memory) Mitigations Attempted (None Worked) setPurgeableState Setting resources to purgeable state before release: [buffer setPurgeableState:MTLPurgeableStateEmpty]; Result: Metal ignores this hint and does not reclaim heap memory. Avoiding Heap Pooling Forcing individual buffer allocations instead of heap-based pooling. Result: Leak persists - Metal still manages underlying allocations. Aggressive Buffer Compaction Attempting to compact/defragment buffers within heaps every frame. Result: Only moves data between existing heaps. Does NOT release heaps back to OS. Reducing Pool Sizes Minimizing all buffer pool sizes to force more frequent reuse. Result: Slightly slows the leak rate but does not stop it. Root Cause Analysis How Metal Heap Allocation Appears to Work Metal allocates GPU heap blocks in large chunks (~1 GB observed) Application requests buffers from these heaps When application releases buffers, memory becomes "unused" within the heap Metal does NOT release heap blocks back to macOS, even when entirely unused When fragmentation prevents reuse, Metal allocates new heap blocks Result: Continuous memory growth with no upper bound The Core Problem There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application. Expected Behavior Metal should: Release unused heaps - When a heap block is entirely unused, release it back to macOS Respect purgeable hints - Honor setPurgeableState calls from applications Compact allocations - Defragment heap allocations to reduce fragmentation Provide control APIs - Allow applications to request heap compaction or release Enforce limits - Have configurable maximum heap memory consumption Security Implications Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance No Upper Bound - Memory consumption continues until system failure Unmitigable - End users have no way to prevent or limit the leak Affects All Metal Apps - Any application using Metal heaps is potentially affected Impact Applications become unstable after extended use System-wide performance degrades as memory pressure increases Users must periodically restart applications Developers cannot work around this at the application level Long-running applications (games, creative tools, servers) are particularly affected Request Investigate Metal heap memory management behavior Implement heap release when blocks become entirely unused Honor setPurgeableState hints from applications Consider providing an API for applications to request heap compaction Document any intended behavior or workarounds Additional Notes This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible. The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level. Tested: January 2026 Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)
5
2
907
3w