Post

Replies

Boosts

Views

Activity

URLSessionStreamTask not working in cellular mode
I'm trying to setup a tcp connection from the Apple Watch standalone app to the server by using URLSessionStreamTask. Currently it's working perfectly under Wifi or Bluetooth connecting to my phone. However it's not working in cellular mode. I'm using the same code for iPhone as well. The code can also work in iPhone in cellular mode. May I check whether URLSessionStreamTask supports cellular data in watchOS? Thanks!
1
0
836
Feb ’21
Low performance for calculation of dense layers
Hi, I have some latest experiment which may indicate low performance issues when using Dense layer on the M1 Max (this is a follow-up issue about my previous question ). import tensorflow as tf from tensorflow.keras import Model, layers import numpy as np from tqdm import tqdm class NeuralNet(Model): # Set layers. def __init__(self): super(NeuralNet, self).__init__() # First fully-connected hidden layer. self.fc1 = layers.Dense(8192 * 8 * 2, activation=tf.nn.relu) # Set forward pass. def call(self, x): return self.fc1(x) # Build neural network model. neural_net = NeuralNet() batch_size = 1024 x = np.random.rand(batch_size, 256) for _ in tqdm(range(10000000)): neural_net(x) The above code runs at 17.06it/s on the M1 Max chip and 168.04it/s on the Zotac RTX 3090. Both gpu utilisation of M1 max and RTX 3090 is 100%. The wattage usage for M1 max is 44.5W and 340W for RTX 3090. The M1 max is much slower compared to RTX 3090 (10% the performance of RTX 3090 which shouldn't be the case, it should be roughly 30% of a RTX 3090). Here is the detailed performance comparsion of a RTX 3090 / M1 max for different batch size used which shows RTX 3090 is roughly 10 times faster than a M1 max and even faster for bigger batch size: Notice that the batch size of above experiments is already big enough. Please test the above experiments and fix the problems. Thanks.
2
0
655
Dec ’21