Post

Replies

Boosts

Views

Activity

Compute unit specification for function runs
When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike. Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units? How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?
1
0
101
2d
Mixing Core AI and Core ML in one pipeline
We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?
2
1
182
2d
Compute unit specification for function runs
When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike. Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units? How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?
2
6
92
2d
Specialized models across OS updates
The docs say we can delete the source .aimodel after a .persistent specialize and keep the bookmark to save space. But an OS update always invalidates the cache and bookmarks, so it looks like anyone who deleted the source has to re-download the whole model after every update. For large models, that's a lot of bandwidth and impacts the first-time experience after the update. Is that the intended trade-off, or does the cache hold enough to re-specialize itself? Does every minor OS bumps (27.1 -> 27.2) always invalidate cache when .persisted, or only major ones? Also, can the user delete a .persistent entry themselves through Settings or storage management, or only the app? We need to know whether our "model is ready" state can disappear without the app knowing.
1
0
136
2d
Specialized models across OS updates
The docs say we can delete the source .aimodel after a .persistent specialize and keep the bookmark to save space. But an OS update always invalidates the cache and bookmarks, so it looks like anyone who deleted the source has to re-download the whole model after every update. For large models, that's a lot of bandwidth and impacts the first-time experience after the update. Is that the intended trade-off, or does the cache hold enough to re-specialize itself? Does every minor OS bumps (27.1 -> 27.2) always invalidate cache when .persisted, or only major ones? Also, can the user delete a .persistent entry themselves through Settings or storage management, or only the app? We need to know whether our "model is ready" state can disappear without the app knowing.
3
6
112
2d
Mixing Core AI and Core ML in one pipeline
We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?
1
6
65
2d
Compute unit specification for function runs
When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike. Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units? How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?
Replies
1
Boosts
0
Views
101
Activity
2d
Mixing Core AI and Core ML in one pipeline
We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?
Replies
2
Boosts
1
Views
182
Activity
2d
Compute unit specification for function runs
When we specialize with preferredComputeUnitKind: .neuralEngine, the resolved options' allowedComputeUnitKinds return all three units, and sometimes a function we intend for ANE ends up on the GPU. We can't find any API that reports where a function actually ran, but system resource utilization shows a GPU spike. Is there a supported way to confirm the actual compute unit at runtime? And how does your prioritization work if we prefer running on the ANE rather than the GPU? Or, can we disallow certain compute units? How does this compute unit selection map with someone coming from CoreML where the desired compute units were honored?
Replies
2
Boosts
6
Views
92
Activity
2d
Specialized models across OS updates
The docs say we can delete the source .aimodel after a .persistent specialize and keep the bookmark to save space. But an OS update always invalidates the cache and bookmarks, so it looks like anyone who deleted the source has to re-download the whole model after every update. For large models, that's a lot of bandwidth and impacts the first-time experience after the update. Is that the intended trade-off, or does the cache hold enough to re-specialize itself? Does every minor OS bumps (27.1 -> 27.2) always invalidate cache when .persisted, or only major ones? Also, can the user delete a .persistent entry themselves through Settings or storage management, or only the app? We need to know whether our "model is ready" state can disappear without the app knowing.
Replies
1
Boosts
0
Views
136
Activity
2d
Specialized models across OS updates
The docs say we can delete the source .aimodel after a .persistent specialize and keep the bookmark to save space. But an OS update always invalidates the cache and bookmarks, so it looks like anyone who deleted the source has to re-download the whole model after every update. For large models, that's a lot of bandwidth and impacts the first-time experience after the update. Is that the intended trade-off, or does the cache hold enough to re-specialize itself? Does every minor OS bumps (27.1 -> 27.2) always invalidate cache when .persisted, or only major ones? Also, can the user delete a .persistent entry themselves through Settings or storage management, or only the app? We need to know whether our "model is ready" state can disappear without the app knowing.
Replies
3
Boosts
6
Views
112
Activity
2d
Mixing Core AI and Core ML in one pipeline
We built a setup where a model split into an encoder and a decoder can run each part on a different backend, using our own component protocols. Is mixing Core AI and Core ML within a single inference pass something you would recommend, and what is the realistic cost at the boundary where we convert between MLMultiArray / MLTensor and NDArray? Is there a way to keep the encoder output resident on the GPU or ANE so it does not need a host round trip into the other backend?
Replies
1
Boosts
6
Views
65
Activity
2d