This code reproduces the crash:
test.txt
Also, running WITH OUT metal (just CPU) is 4X faster with 'SDG' optimizer. I can't compare the ADAM optimizer since it crashed.
In [2]: import tensorflow as tf
...:
...: mnist = tf.keras.datasets.mnist
...:
...: (x_train, y_train), (x_test, y_test) = mnist.load_data()
...: x_train, x_test = x_train / 255.0, x_test / 255.0
...:
...: model = tf.keras.models.Sequential([
...: tf.keras.layers.Flatten(input_shape=(28, 28)),
...: tf.keras.layers.Dense(128, activation='relu'),
...: tf.keras.layers.Dropout(0.2),
...: tf.keras.layers.Dense(10)
...: ])
...:
...: predictions = model(x_train[:1]).numpy()
...: tf.nn.softmax(predictions).numpy()
...:
...: loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True
...: )
...:
...: loss_fn(y_train[:1], predictions).numpy()
...:
...: model.compile(optimizer = 'adam', loss = loss_fn)
...: model.fit(x_train, y_train, epochs=100)
Epoch 1/100
2021-10-10 10:50:53.503460: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-10 10:50:53.527 python[25080:3485800] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x6000037975a0
zsh: segmentation fault ipython
tensorflow_metal (GPU):
% time python test.py
2021-10-10 11:34:34.602604: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Metal device set to: AMD Radeon Pro 5700 XT
systemMemory: 128.00 GB
maxCacheSize: 7.99 GB
2021-10-10 11:34:34.603850: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-10 11:34:34.604642: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
2021-10-10 11:34:35.779610: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/100
2021-10-10 11:34:35.929611: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
1875/1875 [==============================] - 7s 3ms/step - loss: 0.7213
Epoch 2/100
1875/1875 [==============================] - 6s 3ms/step - loss: 0.38653ms/step - loss: 0.0474
...
Epoch 100/100
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0473
python test.py 721.48s user 375.56s system 173% cpu 10:31.28 total
(tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 ~ %
tensorflow (CPU):
% time python ~/test.py
2021-10-10 11:45:44.111971: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-10 11:45:44.487763: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/100
1875/1875 [==============================] - 1s 460us/step - loss: 0.7210
Epoch 2/100
1875/1875 [==============================] - 1s 459us/step - loss: 0.3874
Epoch 3/100
1875/1875 [==============================] - 1s 459us/step - loss: 0.3233
Epoch 4/100
1875/1875 [==============================] - 1s 460us/step - loss: 0.2884
Epoch 5/100
1875/1875 [==============================] - 1s 471us/step - loss: 0.2608
Epoch 6/100
1875/1875 [==============================] - 1s 462us/step - loss: 0.2400
Epoch 7/100
...
Epoch 99/100
1875/1875 [==============================] - 1s 468us/step - loss: 0.0455
Epoch 100/100
1875/1875 [==============================] - 1s 469us/step - loss: 0.0463
python ~/test.py 181.09s user 48.20s system 246% cpu 1:32.86 total
(ai) davidlaxer@x86_64-apple-darwin13 text %