Post

Replies

Boosts

Views

Activity

Reply to Metal draw indirect missing draw count
And here's the D3D12 call that is similar. It has MaxCommandCount void ExecuteIndirect( ID3D12CommandSignature *pCommandSignature, UINT MaxCommandCount, <- ID3D12Resource *pArgumentBuffer, UINT64 ArgumentBufferOffset, ID3D12Resource *pCountBuffer, UINT64 CountBufferOffset );
Topic: Graphics & Games SubTopic: General Tags:
Jun ’21
Reply to Metal draw indirect missing draw count
ICB's are only available on A9+, so that's why I was adopting the ID and not ICB calls. Our game title supports A7+. And yes, the ICB's have a range so I don't understand why the ID calls don't. And there's no sample code on how to use indirect draw that I've found. This is a simple use case of trying to collect disjoint index buffer offsets/sizes for draws into a series of indirect draw calls. That way it's a single draw call instead of 1000. The materials are all the same for a given range of ID submissions to the buffer. Without a count, one can't even accumulate different materials. Drawing 8 materials with an offset of 0, then 10 with an offset of 8 draws into the same MTLBuffer of size 80 draws, results in drawing 80 and then 72 indirect draws. But I need 8 and then 10.
Topic: Graphics & Games SubTopic: General Tags:
Jun ’21
Reply to Metal draw indirect missing draw count
Also when I looked into ICB's, they lock down all sorts of pipeline data and have a raster pipeline state inheritance model that makes them unusable except in that Apple sample code. Here I just want to record draws either on the CPU or GPU, and control the draw count of multiple index buffer ranges. We also target iOS9, but the ICB calls seem to be iOS12/13 level.
Topic: Graphics & Games SubTopic: General Tags:
Jun ’21
Reply to Metal draw indirect missing draw count
Am I misunderstanding this call. I see it used in MoltenVK. Does the drawIndexedPrimitives call only draw one indirect draw call out of the buffer at the offset? Is that why the "count" is missing. So I need to call it multiple times for each of my drawCounts. If so, then I might be able to salvage what I have.
Topic: Graphics & Games SubTopic: General Tags:
Jun ’21
Reply to Shader hotloading broken - newLibraryWithData on metallib returns cached not new metallib
So I did finally get time to write a test case and newLibraryWithData did indeed hotload. In the engine, it looks like the render/compute pipelines were not reset to point to the new MTLFunction. That's why the old shaders were used despite recompiling and replacing them with new MTLFunction objects. The URL load path should still be fixed to test modstamp and/or hash of each shader. And some path to update the shaders on existing render/compute pipelines would be helpful. Maybe this will save others some pain. I also have hotloading in my ktx/ktx2 tool called kram. Shader hotloading should be a pervasive part of all Apple demos, but I never see it used anywhere.
Topic: Graphics & Games SubTopic: General Tags:
Jul ’21
Reply to 16" MBP w/AMD doesn't support MTLCounterSamplingPointAtStageBoundary
Yes, I ended up using the draw stage boundaries around all of our renderPasses, and on iOS I use the stage boundary calls. I thought I was going to have to set draw boundary data on each draw call, but the draw stage boundaries were really just a timestamp to inject into the command stream. The WWDC video was helpful. MTLParallelRenderCommandEncoders weren't supported, but was able to define timers around the sub-encoders on those. It was a ton of code and tricky to support both macOS and iOS, and I had to deal with 3 different encoders, and adjusting the timestamps on macOS Intel. It's done now and working at least for macOS 11+ and iOS 14+. Also should solve M1 timings.
Topic: Graphics & Games SubTopic: General Tags:
Nov ’21
Reply to Setting TCP_QUICKACK for iOS socket
Seeing select() take up to 200ms or more because of this. This then hitches our game since we're trying to communicate with a remote manager. I tried setting TCP_NODELAY to no effect to fix this. So must also need TCP_QUICKACK but that's not defined. Since OSX was as BSD OS, it seems odd that this is missing and that full support isn't there. I probably can't share links on this forum, but this is the issue. Nagle's Algorithm and Delayed ACK Do Not Play Well Together in a TCP/IP Network
Dec ’21
Reply to nextDrawable stalls commit of command buffer
This leads to 5-8ms of cpu driver processing that overlaps with the 10ms to 26ms nextDrawable wait. I can't post a picture from Metal System Trace here, but seems that one should be able to completely commit one CB before getting stalled by the API. Having to use two just to workaround the nextDrawable stall isn't great, but is my workaround for now. No stall is seen, since it's all using triple buffering. If I switched to double, then it gets unusable. Very little of the render command buffer submission depends on the drawable. The framebuffer cb reads from the results of the offscreen in that cb just to display to the UIView. So the nextDrawable stall basically prevents that work from being submitted in the single cb case.
Topic: Graphics & Games SubTopic: General Tags:
Jan ’22
Reply to thread_policy_set(1) returned 46
Same problem here trying to set the affinity mask/hint. What is return code 46? Is Apple trying to prevent use of this API on iOS? The macOS side correctly returns 0. We're using the following code as per Apple's documentation. It doesn't for for any mask value. This is good as we can do without real affinity support, and just hint and hope. thread_affinity_policy_data_t policy = { (int)( mask & 0xFFFFFFFF ) }; int rc = thread_policy_set( pthread_self(), THREAD_AFFINITY_POLICY, (thread_policy_t)&policy, 1 );
May ’22
Reply to thread_policy_set(1) returned 46
I have no idea where to find 46 is not supported, but thanks as always Quinn. You are super helpful. I guess it's from sys/errno.h. What is the alternative? QoS control isn't even remotely the same as affinity. The system will always have higher and QoS levels unavailable to us, so the system should always be responsive even if we use affinity in game. At least Android has affinity control, and it hasn't destroyed the platform. And when you're building a game, and want to run jobs consistently on cores and monitor them in captures, then not having any affinity control on macOS or iOS is a problem. We use affinity control for cores on all products except Apple's, and the workarounds for this aren't ideal for optimizing performance. I somehow feel this is like removing dll hotloading on iOS in iOS 12. We used to be able to reload our C++ game code, and now Apple requires the app devs to completely relaunch builds. That kills iteration. Look at Unity or UE4/5 having to do the same. What iOS removed affinity hinting? This came out in macOS 10.5, and the call is available on iOS. Just seems like the call has been disabled of late. Maybe it's available to set other values, but at this point it's a little late for being experimental. https://developer.apple.com/library/archive/releasenotes/Performance/RN-AffinityAPI/#//apple_ref/doc/uid/TP40006635-CH1-DontLinkElementID_2 We have 2 big and 4 little cores. The 4 little cores run 2-3x slower than the big. We'd like to prioritize tasks on the big cores and then see those tasks running there. Maybe even ignore the little cores so that we hit our frame rate. There are no scheduler examples from Apple on how to do this. Having 50 queues going to libdispatch also isn't the correct model. Also we're running iOS builds on macOS M1. Does this call work there?
May ’22
Reply to thread_policy_set(1) returned 46
Seems like the new Swift threading model is all about no more threads than cores, and keeping each core busy with work. So that implementation is locking threads to cores via affinity. That's exactly why we also need affinity control. And this api doesn't appear to be supplied for C++ code. I hadn't seen that presentation, so digging into it. Thanks!
May ’22