a2they’s Profile | Apple Developer Forums

Compute unit specification for function runs

When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike. Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units? How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?

Machine Learning & AI Core AI

1

0

101

2d

Mixing Core AI and Core ML in one pipeline

We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?

Machine Learning & AI Core AI

2

1

182

2d

Compute unit specification for function runs

When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike. Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units? How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?

Machine Learning & AI Core AI

2

6

92

2d

Specialized models across OS updates

The docs say we can delete the source .aimodel after a .persistent specialize and keep the bookmark to save space. But an OS update always invalidates the cache and bookmarks, so it looks like anyone who deleted the source has to re-download the whole model after every update. For large models, that's a lot of bandwidth and impacts the first-time experience after the update. Is that the intended trade-off, or does the cache hold enough to re-specialize itself? Does every minor OS bumps (27.1 -> 27.2) always invalidate cache when .persisted, or only major ones? Also, can the user delete a .persistent entry themselves through Settings or storage management, or only the app? We need to know whether our "model is ready" state can disappear without the app knowing.

Machine Learning & AI Core AI

1

0

136

2d

Specialized models across OS updates

The docs say we can delete the source .aimodel after a .persistent specialize and keep the bookmark to save space. But an OS update always invalidates the cache and bookmarks, so it looks like anyone who deleted the source has to re-download the whole model after every update. For large models, that's a lot of bandwidth and impacts the first-time experience after the update. Is that the intended trade-off, or does the cache hold enough to re-specialize itself? Does every minor OS bumps (27.1 -> 27.2) always invalidate cache when .persisted, or only major ones? Also, can the user delete a .persistent entry themselves through Settings or storage management, or only the app? We need to know whether our "model is ready" state can disappear without the app knowing.

Machine Learning & AI Core AI

3

6

112

2d

Mixing Core AI and Core ML in one pipeline

We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?

Machine Learning & AI Core AI

1

6

65

2d

a2they

Post

Replies

Boosts

Views

Activity