Hi,
tensorflow-macos 2.9.2
tensorflow-metal 0.5.0
macOS Montery 12.4 (patched and upto date)
Machine : iMac Retina 5K,
27 Inch, 2020,
3.8GHz 8-Core Intel Core i7,
128Gb 2667 Mhz DDR4,
Graphics AMD Radeon Pro 5500 XT 8GB
Command to run (as per documentation)
python3 train.py -c config/stm32f415_tinyaes.json
When running on GPU the slow down occurs exactly the same epoch (19), as a test I disabled the GPU in a duplicate script and whilst taking considerably longer, passed epoch 19, as you can see on GPU enable epoch 19 the time has gone upto 122:06:17
Commend to run (for CPU only, slight modification to script included)
python3 train_cpu.py -c config/stm32f415_tinyaes.json
Script modification to disable GPU (I have left in the last line and first line of the original script so the placement can be identified, else its identical.
from scaaml.utils import tf_cap_memory
try:
# Disable all GPUS
tf.config.set_visible_devices([], 'GPU')
visible_devices = tf.config.get_visible_devices()
for device in visible_devices:
assert device.device_type != 'GPU'
except:
# Invalid device or cannot modify virtual devices once initialized.
pass
def train_model(config):
CPU ONLY
2048/2048 [==============================] - 5014s 2s/step - loss: 1.3966 - acc: 0.4811 - val_loss: 1.5574 - val_acc: 0.4297
Epoch 25/30
1502/2048 [=====================>........] - ETA: 22:02 - loss: 1.3701 - acc: 0.4919
GPU ENABLED
2022-07-05 14:43:20.822168: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 5 of 46). These functions will not be directly callable after loading.
2048/2048 [==============================] - 516s 252ms/step - loss: 1.9292 - acc: 0.3521 - val_loss: 1.9108 - val_acc: 0.3503
Epoch 18/30
2048/2048 [==============================] - ETA: 0s - loss: 1.8986 - acc: 0.35982022-07-05 14:52:39.447402: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-07-05 14:52:39.450685: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2048/2048 [==============================] - 546s 267ms/step - loss: 1.8986 - acc: 0.3598 - val_loss: 2.0514 - val_acc: 0.3303
Epoch 19/30
741/2048 [=========>....................] - ETA: 122:06:17 - loss: 1.8543 - acc: 0.3750/Users/alan/.pyenv/versions/3.9.5/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker
I have run the code on an external system with GPUs based on linux and it runs without problem. This is blocking my research project (MSc) and whilst I can still use the CPU mode, the idea is to compare/baseline against various platforms and functionalities (whilst also using my own traces), so relevant to be able to use all the features available of the host system (GPUs in this case).
Hope this helps and you can offer a solution.
Regards,
alz0r