Post

Replies

Boosts

Views

Activity

tensorflow-metal 0.3.0 error when training model
I installed TensorFlow (version 2.7.0)as per the instructions here. And I ran an example Keras model (https://keras.io/examples/vision/3D_image_classification). Here is the full Colab code (https://colab.research.google.com/github/keras-team/keras-io/blob/master/examples/vision/ipynb/3D_image_classification.ipynb). The error occurs when I attempt model training: # Compile model. initial_learning_rate = 0.0001 lr_schedule = keras.optimizers.schedules.ExponentialDecay( initial_learning_rate, decay_steps=100000, decay_rate=0.96, staircase=True ) model.compile( loss="binary_crossentropy", optimizer=keras.optimizers.Adam(learning_rate=lr_schedule), metrics=["acc"], ) # Define callbacks. checkpoint_cb = keras.callbacks.ModelCheckpoint( "3d_image_classification.h5", save_best_only=True ) early_stopping_cb = keras.callbacks.EarlyStopping(monitor="val_acc", patience=15) # Train the model, doing validation at the end of each epoch epochs = 100 model.fit( train_dataset, validation_data=validation_dataset, epochs=epochs, shuffle=True, verbose=2, callbacks=[checkpoint_cb, early_stopping_cb], ) The error log is quite long, so I attached it as a txt file. The error occurs when I attempt to train the model using the GPU. The error does not happen when I specify with tf.device('/cpu:0'):, so CPU training appears to be fine, and returns similar results when cross-checked with the results obtained on Google Colab. This appears to be an issue specifically due to tensorflow-metal. My tensorflow-metal version is 0.3.0. error.txt
3
1
1.6k
Dec ’21