Post

Replies

Boosts

Views

Activity

Shader compiler issue with any() use on iOS
We target MSL 1.1 on iOS9, and are seeing non-equivalence to the following. The upper code gens bad pixels on iOS but is the more efficient form. macOS (on AMD 5500m) is fine. I will log this to Feedback Assistant, but also here too. The code was also compiled with -O2. So could be an iOS optimizer bug. #if 1 if ( all( greaterThanEqual(pos.xy, v_clip.xy )) && all( lessThanEqual(pos.xy, v_clip.zw )) ) #else if ( pos.x = v_clip.x && pos.x = v_clip.z && pos.y = v_clip.y && pos.y = v_clip.w ) #endif This is codgen out of spirv-cross. Mac and iOS codegen is the same for this chunk. These are on iOS With #1, this doesn't work: fsmain_out out = {}; float4 color = float4(0.0); float2 pos = gl_FragCoord.xy; bool _35 = all(pos = in.v_clip.xy); bool _43; if (_35) { _43 = all(pos = in.v_clip.zw); } else { _43 = _35; } if (_43) ... With #0, this works fsmain_out out = {}; float4 color = float4(0.0); float2 pos = gl_FragCoord.xy; bool _38 = pos.x = in.v_clip.x; bool _47; if (_38) { _47 = pos.x = in.v_clip.z; } else { _47 = _38; } bool _56; if (_47) { _56 = pos.y = in.v_clip.y; } else { _56 = _47; } bool _65; if (_56) { _65 = pos.y = in.v_clip.w; } else { _65 = _56; } if (_65)
2
0
724
Jun ’21
os_signpost use adds +20ms to 30ms renders
I tried converting our Android ATrace scopes to use os_signpost, but this seems to add 20ms of cpu time to every frame. ATrace_isEnabled is only called with AGI (Android GPU Inspector) takes a capture, but there don't seem to be flags that indicate when an Instruments capture is being taken. AGI gives us a nice tracks in Perfetto of cpu and gpu timings with pseudo-coloring and text in each track that help interpret the frame, and without a 20ms hit. Instruments gives microscopically tiny tracks that are all blue with no text in the os_signpost widget. I have to hover over every track which is about 2 pixels high to see the timings, and the timings for each frame is 400ms instead of the actual 50ms that is the actual time. Is there a better method to see scoped cpu timings for macOS/iOS considering dtrace isn't available, or somehow improve the performance hit there?
2
0
1.1k
Jul ’22
iOS/M1 does not generate consistent depth from multiple passes
We are using first pass depth. I know it's not recommended, but we have one and need it. Deferred renders uses this, and we do too. We've tried setting [invariant] on the position, and now are resorting to slope and depth biasing the second pass. We even set -fpreserve-invariance on the compiler. This whole construct is confusing. "invariant" was added in MSL 2.1, but requires iOS 13 to set that compiler flag, and then other code states that flag must be set for iOS 14 and macOS11 SDK use (minSDK? buildSDK?). We also tried disabling -fno-fast-math to no avail. But why is a simple v = v * m calculation different once polys hit the near plane or the viewport edges. The polys then seem to per-tile z-fight. Some tiles have stripes of z, and some are just completely missing. These are the same tris going through two shaders that do the same vertex calc. That shouldn't be happening, unless the tiles are computing gradients per tile incorrectly from the one pass to the next. On long clipped tris, it looks like a hardware/driver bug computing consistent depths across the same triangles. This was tested on older (iPhone 6) and newer iOS devices and M1 MBP.
2
0
1k
May ’22
Xcode Key Bindings has no way to clear a binding?
There used to be a "-" icon on the right side of each key binding, but that's been removed. With no help text on how to clear a key binding. Even in the "conflicts" list, there's no help as to how to clear them. When I highlight a binding, and hit "delete" key (or shift + delete), Xcode just enters that as a conflicting key binding. How about just honoring the "delete" key and clearing the binding?
2
1
859
Feb ’24
16" MBP w/AMD doesn't support MTLCounterSamplingPointAtStageBoundary
This is the latest Intel Mac running with AMD 5500, and it can't sample timings at stage boundaries? How are we supposed to write timing consistently for macOS and iOS if that's not the case? So I have to then add several 1000 samples per draw call and accumulate them? I don't remember the docs or sample code pointing this out. Our app compiles to deploy on macOS 10.15. Does setting that higher help with this? MTLCounterSamplingPointAtStageBoundary is not supported, startOfVertexSampleIndex must be MTLCounterDontSample. MTLCounterSamplingPointAtStageBoundary is not supported, startOfFragmentSampleIndex must be MTLCounterDontSample
3
0
875
Nov ’21
How do MTKView/CAMetalLayer and extended colorspaces work?
These make no sense. Several of the presentations on wide gamut lack specific example code. I would assume if I have linear rgba16f data, that I could specify srgb, or linearSrgb colorspace and get the same output to the screen but that is not the case. There is no documentation except for the briefest of comments on each color space, but now how MTKView actually transform the colors. There is even less documentation on extended color spaces, and what to do when the fail to display expected results. When we set one of these, the gamma is totally off. And it's unclear what to set so we go from HDR back to EDR. src srgb8 -> srgbColorSpace -> gamma 2.2 -> incorrect, doubly applied srgb?, why can layer just do passthrough rgba16f -> srgbColorSpace -> gamma 2.2 -> correct, seems to apply gamma and composites properly with rest of AppKit rgba16f -> linearSrbColorSpace -> gamma 1.0 -> incorrect, isn't my data linear?
4
1
2.3k
Feb ’23
CoreVideo + Rosetta still clamps at 60Hz (since macOS 12)
We set the CVDisplayLink on macOS to 0 or 120, and get the following. This then clamps maximum refresh to 60Hz on the 120Hz ProMotion display on a MBP M2 Max laptop. How is this not fixed in 4 macOS releases? CoreVideo: currentVBLDelta returned 200000 for display 1 -- ignoring unreasonable value CoreVideo: [0x7fe2fb816020] Bad CurrentVBLDelta for display 1 is zero. defaulting to 60Hz.
5
0
639
4h
Metal draw indirect missing draw count
Why is there no count to any of these draw indirect directives? I am appending draws to a single MTLBuffer on the cpu, but can't limit how many are drawn out of the buffer. An offset isn't enough to specify a range. Can this be supplied in some bind call? - (void)drawIndexedPrimitives:(MTLPrimitiveType)primitiveType indexType:(MTLIndexType)indexType indexBuffer:(id <MTLBuffer>)indexBuffer indexBufferOffset:(NSUInteger)indexBufferOffset indirectBuffer:(id <MTLBuffer>)indirectBuffer indirectBufferOffset:(NSUInteger)indirectBufferOffset API_AVAILABLE(macos(10.11), ios(9.0)); Contrast this with the Vulkan call which as an offset and count. vkCmdDrawIndexedIndirect( m_encoder, indirectBuffer, drawBufferOffset, drawCount, sizeof( vkCmdDrawIndexedIndirect ) );
8
0
1.6k
Jun ’21
dlopen() reloads original instead of new dylib after changes
We have a C++ library that we hotload on macOS. This uses dlopen() and dlclose() and worked up until recent versions of Catalina. We don't use thread_local and don't have Objective-C code in the library. dlopen() succeeds, we use the original dylib. Then for hotloading we dlclose() the original dylib and then dlopen() the new dylib. All this succeeds, and no dlerror occurs. All of the dyld output indicates that the library is being unloaded and loaded back in. But after changing the sources, and building a new dylib, the app returns the original dylib and not the new one. This seems to be a problem in the dyld layer itself, and not our sources. On older macOS builds, the hotloading works correctly. Given the lack of edit+continue in Xcode, this is the only way to iterate quickly on source code changes. How do we fix this? We are not using the hardened runtime. This is failing on macOS 10.15.7 with Xcode 12.2 (and 12.3).
9
0
3.9k
Apr ’21
Shader hotloading broken - newLibraryWithData on metallib returns cached not new metallib
This breaks shader hotloading and has been a persistent bug in Metal for the past many years. Metal holds onto some existing lib, returns it, without checking that the data content has changed. Similar bugs happen with Metal's shader cache not checking modification timestamps. In my case, I'm just changing a color in the shader from float3(1,0,0) to float3(1,1,0) and then never seeing the result of the shader change. The new metallib is loaded from disk, and handed off to newLibraryWithData. I can tell that it's returning a cached metallib, because we set a label on the MTLFunction that is returned. That's not nil on the first load of the shader, and after the hotload of the new metallib the label is non-nil. So we just see the old shader content. This is a very important Radar to fix.
9
0
2.2k
Sep ’21
Is device.currentAllocatedSize and gpu capture buffer memory accurate on iOS/iPad?
I see reasonable numbers from this on macOS, but on iPad I see really large numbers from this, and in the gpu capture that don't add up. This is Xcode 12.2 and and iPad 14.0.1. Textures and Buffers add up to 261MB which is close to the macOS. The memory summary, and the "other" area in the buffers area report 573MB when I hover over that. Also device.currentAllocatedSize reports 868MB total. I assume the buffer size is skewing the memory totals, since Xcode reports 620MB for the entire app. I would attach a screenshot of the gpu capture showing the memory capture, but seems that the new forums don't support this, and not being able to search categories anymore is rather limiting. Non-voliatile 261 Volatile 0 Textures 195 Buffers 66 <- but hover over "other" reports 573 Private 184 Shared 77 Used 166 Unused 95
12
0
2.6k
Jan ’22