Hi, I am getting this error with test script from the tensorflow plugin metal page. I have a power mac M3 on OS 14.4 (latest at this time.) Unfortunately, I created another thread https://developer.apple.com/forums/thread/748413. Should I close that one?
Tensorflow metal was working GREAT on my Power Mac Mac M3 until Tuesday. Then my code started freezing. I ran the test script from https://developer.apple.com/metal/tensorflow-plugin/ and it now crashes - this used to work fine, but all of a sudden it does not. The results are shown below.
Was there ever any answers on the previous posts? Could this be a hardware problem?
The test script is just this:
import tensorflow as tf
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
model = tf.keras.applications.ResNet50(
include_top=True,
weights=None,
input_shape=(32, 32, 3),
classes=100,)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
model.fit(x_train, y_train, epochs=5, batch_size=64)
The errors I get are like the following:
Epoch 1/5
1/782 [..............................] - ETA: 51:53 - loss: 6.0044 - accuracy: 0.0312Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
<AGXG15XFamilyCommandBuffer: 0x1172515e0>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
commandQueue = <AGXG15XFamilyCommandQueue: 0x17427e400>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
retainedReferences = 1
Error: command buffer exited with error status.
The Metal Performance Shaders operations encoded on it may not have completed.
Error:
(null)
Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
<AGXG15XFamilyCommandBuffer: 0x117257b40>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
commandQueue = <AGXG15XFamilyCommandQueue: 0x17427e400>
label = <none>
device = <AGXG15SDevice: 0x1588e6000>
name = Apple M3 Pro
retainedReferences = 1
Topic:
Machine Learning & AI
SubTopic:
General
Tags: