Post

Replies

Boosts

Views

Activity

Reply to Metal texture allocated size versus actual image data size
It's not really padding, it's page alignment. On console, you could sometimes pack buffer data into the unused bytes. Android has similar hardware and alignment requirements. Sometimes mips have to be aligned to the page size, but maybe Metal team can relay more here. It's gpu specific, so I doubt they'll want to commit to details. Also hardware often has a packed mip tail in order to cut page use for smaller mip sizes. Is your output showing the buffers that you generated to upload to Vulkan textures. These are linear block order texture data, and not the tiled order blocks used by the hw that may be aligned to texture pages. Some systems also have to pad mips out to a power-of-two size. On desktop the tile size is 64KB. On iOS the tile size is 16KB. Here are tileSizes. Format Desktop(64K) Mobile (16K) ASTC/BC7 256x256 128x128 BC1/ETCr11 512x256 256x128
Topic: Graphics & Games SubTopic: Metal Tags:
Mar ’22
Reply to nextDrawable stalls commit of command buffer
That sounds even more of a stall if one is trying to use double-buffering. All Apple examples always fallback to triple-buffer to buy more time, and so I'd encourage fixing this api. Vulkan doesn't stall in vkImageAquire for long at all, less that 3ms. Metal can take 30-40ms when the gpu is saturated. In Vulkan, that allows a single command buffer to hold the backbuffer aquire and present to the backbuffer.
Topic: Graphics & Games SubTopic: General Tags:
Feb ’22
Reply to How does one get a universal macOS library from a Swift package dependency?
The proposed solution of explicitly setting x86_64 arm64 on macOS targets doesn't work for me either with Xcode 13.1. My .a library files, and frameworks all report that they are missing the arm64 architecture when trying to build for "Any Mac". This is because the scheme is "Debug" and that activates active arch only (in my case x86_64). The debug setting of "build for active arch only" is not being overridden by the setting to build a universal app when "Any Mac" is set. Setting the "Build" target to "Release" avoids all the crazy errors when set to "Any Mac". But then there's a crazy error from my use of modules. mm_malloc.h must not have an equivalent on macOS arm64, or the module isn't built properly for the arm64 build. This was originally from sse intrinsics, so I'll look to replace that with something uninverals. #include <mm_malloc.h> Module '_Builtin_intrinsics.intel.mm_malloc' requires feature 'x86'
Topic: Programming Languages SubTopic: Swift Tags:
Jan ’22
Reply to nextDrawable stalls commit of command buffer
Also isn't CADisplayLink supposed to synchronize the frames. And we have a semaphore that counts down from the drawable count when before we request nextDrawable which also seems redundant. The problem is that the program thread wants to get onto the next frame, but can't due to the requirement to call nextDrawable. Even Apple's docs state this should be called as late in the frame as possible, but when that call takes 24ms, it feels like it's doing the job of the display link.
Topic: Graphics & Games SubTopic: General Tags:
Jan ’22
Reply to nextDrawable stalls commit of command buffer
This supplied best practices example isn't performant. It incurs all the stalls using a single command buffer. https://developer.apple.com/library/archive/documentation/3DDrawing/Conceptual/MTLBestPracticesGuide/Drawables.html Here's a more realistic example of what happens using a single command buffer. beginCommandBuffer beginEncoding 98% of commands to offscreen endEncoding <- this is where I currently end/commit the 98% command buffer to start driver processing 20ms+ stall on nextDrawable under heavy gpu load beginEncoding 2% of commands to drawable (say a blit from offscreen to drawable) endEncoding 1ms+ stall on presentDrawable (some stall then drawable.present added to addScheduledHandler) endCommandBuffer [cb commit] <- this is where commands are sent to queue and the driver Ideally the nextDrawable and presentDrawable should be off in their own little core using thread, so the main thread on a big core isn't stalled out. The case we have are 90 alpha blended quads on iOS that cause an 11ms gpu time + the rest of rendering. This then stalls the nextDrawable returning drawables to the pool, and with a single command buffer stalls processing the next frame and getting to the next cpu update. There is also still no test for isDrawableAvailable in the pool.
Topic: Graphics & Games SubTopic: General Tags:
Jan ’22
Reply to MTLDevice.currentAllocatedSize incorrect on late 2014 iMacs
Those functions don't deal with the small allocation sub-allocation strategy used with 128K buffers. So gpu capture tends to just report 128K for the sizes regardless of whether each buffer is shared inside the same 128K buffer. The tool needed to dedupe those allocations. I filed a forum post and feedback assistant on this, and Apple was looking to fix the issue.
Topic: Graphics & Games SubTopic: General Tags:
Dec ’21
Reply to Granting full-disk access to my sandboxed app not working
I'm trying to read gltf files which consist of json, a bin file, and a series of png images all in the same folder. How do I ship a viewer for this, if the viewer must be sandboxed, but the sandboxed app can only read the .gltf file that was supplied. The "Related Items" seems to imply the same name with different extensions, but here the image names can be anything. Apple's ommision of gltf reader support in ModelIO requires me to supply my own reader, and I can't ask users to run usdzconvert first. But usdz has a similar file structure with separate images. All for sandboxing, except when it doesn't work. Also trying to submit this question in Chrome results in "Bad Message 431 reason: Request Header Fields Too Large". so then I had to switch to Safari. Basically do users need to pick the folder containing the gltf, instead of the gltf file itself. Then the sandbox grants access to that folder and I can read-only any contents therein. This is rather unintuitive from the open panel to pick that. I've shipped poplular tools that had to be pulled from the App Store due to sandboxing.
Topic: App & System Services SubTopic: Core OS Tags:
Oct ’21
Reply to rendering thousands of small meshes
DrawIndirect doesn't work well in iOS or macOS, since you can only submit one draw at a time referencing the buffer and offset above. There's also no stride to store additional instance data, or drawIndirectCount like in Vulkan, where the GPU supplies the count of things to draw. So it's not really saving much over making the draw calls themselves. If you can target A9 which is where DrawIndirect started, then look into IndirectCommandBuffer which can then supply a range of draw calls which is the only way to submit a batch of commands as one submission.
Topic: Graphics & Games SubTopic: General Tags:
Sep ’21
Reply to Metal Validation flags all read-write textures as invalid
That's not exactly going to work. Terrain system using R16Unorm for the precision, and it's the only format storable in PNG which only has 8u and 16u. Photoshop has been using 16u for a long time to store color, so I'm a little shocked that it's not exposed on desktop even if the Apple Silicon can't handle it. R16Float is only 10-bits of precision, and R16Uint doesn't often support filtering. But worst case, the PNG data could be read into R16Uint. The Apple sample code should reflect valid use cases though. If this is suddenly a per-format query, then Metal needs a call to query whether read-write support is possible on every format. Those docs are helpful, but runtime query is needed. Otherwise, how would an app test that R16Unit is supported and not R16Unorm.
Aug ’21