Thanks for pointing out the issue. The reason I want to create random tensor each time during the loop is to avoid potential "caching" for the same calculation of two variables to reflect performance in the real scenario (Loading different data during the for-loop for training / inference).
I also did the experiemnts for your attached code. M1 max scores 103.10it/s and RTX 3090 scores 234.82it/s, which reflects 43.9% of the performance of RTX 3090. But I think this is due to some internal caching, when I run some deep learning models for M1 max, it also shows the training performance is roughly 1/6 the performance of a RTX 3090 which is consistent with my result above (An example would be training qa model on the huggingface tensorflow examples).
The interesting part is about the wattage usage. GPU utilisation and wattage consumption is the same for deep learning and gaming on a RTX 3090 GPU, but they are much different with M1 Max (wattage consumption is much lower for deep learning compared to gaming) which suggests the GPU cores of M1 max might not be fully utilized for deep learning.
Hope you can find the issues and improve the performance of tensorflow running on M1 chip.
Topic:
Graphics & Games
SubTopic:
General
Tags: