Post

Replies

Boosts

Views

Activity

Reply to Accelerate simd library isn't complete for C++/ObjC/ObjC++
Also noticing many of the oft-used routines like matrix transpose only have SIMD paths for SSE. Do the Neon paths on these go to scalar or simd ops? static simd_float2x2 SIMD_CFUNC simd_transpose(simd_float2x2 __x) { #if defined __SSE__ - why no Neon path?   simd_float4 __x0, __x1;   __x0.xy = __x.columns[0];   __x1.xy = __x.columns[1];   simd_float4 __r01 = _mm_unpacklo_ps(__x0, __x1);   return simd_matrix(__r01.lo, __r01.hi); #else   return simd_matrix((simd_float2){__x.columns[0][0], __x.columns[1][0]},                 (simd_float2){__x.columns[0][1], __x.columns[1][1]}); #endif } Also there are two abs ops that are a part of AVX512 that are used under __AVX2__ flag. static inline SIMD_CFUNC simd_long2 simd_abs(simd_long2 x) { #if defined __arm64__  return vabsq_s64(x); #elif defined __SSE4_1__ - should be __AVX512F__  return (simd_long2) _mm_abs_epi64((__m128i)x); #else  simd_long2 mask = x 63; return (x ^ mask) - mask; #endif } static inline SIMD_CFUNC simd_long4 simd_abs(simd_long4 x) { #if defined __AVX2__ - should be __AVX512F__  return _mm256_abs_epi64(x); #else  return simd_make_long4(simd_abs(x.lo), simd_abs(x.hi)); #endif }
Topic: App & System Services SubTopic: Core OS Tags:
Mar ’21
Reply to Accelerate simd library isn't complete for C++/ObjC/ObjC++
I think for now, we'll just offer some simpler functional ctors. Seems like for ObjC bridging and all that these typedefs can't be derived from. It just means our vector ops can't use any member functions, they all have to be functional constructs. We can derive and provide members on the matrix types since those are struct { float3 }. This includes the operator[] and a non-initializing ctor. All the float3x3/4x4 ctors init the columns in the void ctor which is safer, but often unnecessary work that simd_float3x3/4x4 don't do. float3 float3m( float a ) { return {a,a,a}; } float3 v = float3m(3.0f); - { 3, 3, 3 } - using function and discouraging float3 v = { 3.0f }; - { 3, 0, 0 } - avoid, this is most dangerous construct of this library, doesn't match MSL or most vecmath libs v = normalize(v); -v.normalize() can't use members v = float3_zero - float3::zero can't use class constants
Topic: App & System Services SubTopic: Core OS Tags:
Mar ’21
Reply to Does MTLBlitEncoder allow uploading more than one array texture per call?
Here's the most basic op on a 2d array, where in GL one can easily specify all the array layers. This is going up the mip chain in one call for each mip, not each mip x each layer. Metal should offer the same ability. The buffer can then be twiddled to the private texture as needed. Here's a ticket on it. https://feedbackassistant.apple.com/feedback/9009192 See the Khronos wiki page on Array Textures, since I can't include the link. glBindTexture(GL_TEXTURE_2D_ARRAY, tex); glTexImage3D(GL_TEXTURE_2D_ARRAY, 0, format, width, height, num_layers, ...); glTexImage3D(GL_TEXTURE_2D_ARRAY, 1, format, width/2, height/2, num_layers, ...); glTexImage3D(GL_TEXTURE_2D_ARRAY, 2, format, width/4, height/4, num_layers, ...);
Topic: Graphics & Games SubTopic: General Tags:
Feb ’21
Reply to Does MTLBlitEncoder allow uploading more than one array texture per call?
I already have an entire level of the array texture loaded into the buffer with a single memcpy. The way KTX and KTX2 store data is an entire level of the same mip size are stored together. I can upload them all at once in GL, so why not in Metal? There should be no iteration on a blit encoder. I should be able to upload 2048 array layers at 1x1 or 2x2 each in a single call., but there is no call in Metal from what I see. I think this would be a big improvement to the MTLBlitEncoder. I shouldn’t have to upload faces, arrays, slices one a time. For mips yes, I can iterate but that count is much less than the array count.
Topic: Graphics & Games SubTopic: General Tags:
Feb ’21
Reply to Xcode defaults ARC to off. CMake-based builds leak large amounts of memory.
CMake says this needs to be fixed by Xcode team, and Xcode teams says the issue needs to be fixed by CMake. So until one side fixes this, I'll continue to use the workaround. But I let the CMake team know the response here. Defaulting ARC to off in a newly create xcodeproject seems incorrect for modern builds, and just having the GUI creation templates override the default value seems like a poor workaround to the underlying problem.
Topic: Graphics & Games SubTopic: General Tags:
Feb ’21