Post

Replies

Boosts

Views

Created

Low performance for calculation of dense layers
Hi, I have some latest experiment which may indicate low performance issues when using Dense layer on the M1 Max (this is a follow-up issue about my previous question ). import tensorflow as tf from tensorflow.keras import Model, layers import numpy as np from tqdm import tqdm class NeuralNet(Model): # Set layers. def __init__(self): super(NeuralNet, self).__init__() # First fully-connected hidden layer. self.fc1 = layers.Dense(8192 * 8 * 2, activation=tf.nn.relu) # Set forward pass. def call(self, x): return self.fc1(x) # Build neural network model. neural_net = NeuralNet() batch_size = 1024 x = np.random.rand(batch_size, 256) for _ in tqdm(range(10000000)): neural_net(x) The above code runs at 17.06it/s on the M1 Max chip and 168.04it/s on the Zotac RTX 3090. Both gpu utilisation of M1 max and RTX 3090 is 100%. The wattage usage for M1 max is 44.5W and 340W for RTX 3090. The M1 max is much slower compared to RTX 3090 (10% the performance of RTX 3090 which shouldn't be the case, it should be roughly 30% of a RTX 3090). Here is the detailed performance comparsion of a RTX 3090 / M1 max for different batch size used which shows RTX 3090 is roughly 10 times faster than a M1 max and even faster for bigger batch size: Notice that the batch size of above experiments is already big enough. Please test the above experiments and fix the problems. Thanks.
2
0
681
Nov ’21
Low performance for calculation of dense layers
Hi, I have some latest experiment which may indicate low performance issues when using Dense layer on the M1 Max (this is a follow-up issue about my previous question ). import tensorflow as tf from tensorflow.keras import Model, layers import numpy as np from tqdm import tqdm class NeuralNet(Model): # Set layers. def __init__(self): super(NeuralNet, self).__init__() # First fully-connected hidden layer. self.fc1 = layers.Dense(8192 * 8 * 2, activation=tf.nn.relu) # Set forward pass. def call(self, x): return self.fc1(x) # Build neural network model. neural_net = NeuralNet() batch_size = 1024 x = np.random.rand(batch_size, 256) for _ in tqdm(range(10000000)): neural_net(x) The above code runs at 17.06it/s on the M1 Max chip and 168.04it/s on the Zotac RTX 3090. Both gpu utilisation of M1 max and RTX 3090 is 100%. The wattage usage for M1 max is 44.5W and 340W for RTX 3090. The M1 max is much slower compared to RTX 3090 (10% the performance of RTX 3090 which shouldn't be the case, it should be roughly 30% of a RTX 3090). Here is the detailed performance comparsion of a RTX 3090 / M1 max for different batch size used which shows RTX 3090 is roughly 10 times faster than a M1 max and even faster for bigger batch size: Notice that the batch size of above experiments is already big enough. Please test the above experiments and fix the problems. Thanks.
Replies
2
Boosts
0
Views
681
Activity
Nov ’21