hello, I got a question about coreml.
I loaded the coreml model in the project and set the computing unit to CPU+GPU.
When I used instruments to analyze the performance, I found that there was an overhead of prepare gpu request before each inference. I also checked the freezing point graph and found that memory was frequently allocated.
Is this as expected? Is there any way to avoid frequent prepares?
I have tried some methods, such as memory sharing of predict interface input parameters, but it seems to be ineffective.
Selecting any option will automatically load the page