Low performance for calculation of dense layers

Question

Created Nov ’21

Replies 2

Boosts 0

Participants 3

Hi, I have some latest experiment which may indicate low performance issues when using Dense layer on the M1 Max (this is a follow-up issue about my previous question ).

import tensorflow as tf
from tensorflow.keras import Model, layers
import numpy as np
from tqdm import tqdm

class NeuralNet(Model):
    # Set layers.
    def __init__(self):
        super(NeuralNet, self).__init__()
        # First fully-connected hidden layer.
        self.fc1 = layers.Dense(8192 * 8 * 2, activation=tf.nn.relu)

    # Set forward pass.
    def call(self, x):
        return self.fc1(x)

# Build neural network model.
neural_net = NeuralNet()
batch_size = 1024
x = np.random.rand(batch_size, 256)
for _ in tqdm(range(10000000)):
    neural_net(x)

The above code runs at 17.06it/s on the M1 Max chip and 168.04it/s on the Zotac RTX 3090. Both gpu utilisation of M1 max and RTX 3090 is 100%. The wattage usage for M1 max is 44.5W and 340W for RTX 3090. The M1 max is much slower compared to RTX 3090 (10% the performance of RTX 3090 which shouldn't be the case, it should be roughly 30% of a RTX 3090).

Here is the detailed performance comparsion of a RTX 3090 / M1 max for different batch size used which shows RTX 3090 is roughly 10 times faster than a M1 max and even faster for bigger batch size:

Notice that the batch size of above experiments is already big enough. Please test the above experiments and fix the problems. Thanks.

Boost

Answer 1

Frameworks Engineer OP

Apple

Dec ’21

Thank you for sharing your observations. We are investigating this. Please file a request through Feedback Assistant and post it here.

0

Answer 2

hxssg1124 OP

Dec ’21

I submitted the issue and the feedback ID is: FB9803715. Thanks for investigating the issue. I plan to redo the experiments using matrix multiplication instead of Dense layers to see whether the issue is specifically for using Dense layers and will update here.

0