Now I found the descriptions in coremltools document.
In newer hardware, e.g. iPhone 15 pro (A17 pro), there is increased int8-int8 compute available on NE
Impact on Latency and Compute Unit Considerations https://apple.github.io/coremltools/docs-guides/source/quantization-overview.html#impact-on-latency-and-compute-unit-considerations
Linear 8-Bit Quantization https://apple.github.io/coremltools/docs-guides/source/performance-impact.html#linear-8-bit-quantization
The key point for A17 Pro is to quantize both weights and activations by per-tensor quantization.
Topic:
Machine Learning & AI
SubTopic:
Core ML
Tags: