Post

Replies

Boosts

Views

Activity

Reply to tf.random is broken since Monterey 12.1
The workaround doesn't work in a tf.function, this is a real problem. I tried other alternative like : randomgen = tf.random.Generator.from_non_deterministic_state() #%% for _ in range(10): g2 = tf.random.get_global_generator() x = g2.uniform((10,),(1,2)) y = g2.uniform((10,),(3,4)) tf.print(x) tf.print(y) But NotFoundError: No registered 'RngReadAndSkip' OpKernel for 'GPU' devices compatible with node {{node RngReadAndSkip}} . Registered: device='CPU' [Op:RngReadAndSkip] And obviously calling this in a tf.function will always generate the same sequence tf.random.stateless_uniform((size,),(1,2),xmin,xmax,tf.float32) this doesn't works too : randomgen = tf.random.Generator.from_non_deterministic_state() @tf.function def MandelbrotDataSet(size=1000, max_depth=100, xmin=-2.0, xmax=0.7, ymin=-1.3, ymax=1.3): global randomgen x = randomgen.uniform((size,),xmin,xmax,tf.float32) y = randomgen.uniform((size,),xmin,xmax,tf.float32) Because of RngReadAndSkip again.
Topic: Machine Learning & AI SubTopic: General Tags:
Dec ’21
Reply to Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
It's a perfectly normal and harmless message on a M1. I have it too and my model & code works just fine. 2021-12-20 23:19:04.025952: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-12-20 23:19:04.026364: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Metal device set to: Apple M1 systemMemory: 8.00 GB maxCacheSize: 2.67 GB __________________________________________________________________________________________________ 2021-12-20 23:19:04.413489: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Epoch 1/10 2021-12-20 23:19:04.723827: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 32/32 [==============================] - ETA: 0s - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.02562021-12-20 23:19:24.073636: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 32/32 [==============================] - 20s 608ms/step - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.0256 - val_loss: 0.0100 - val_accuracy: 0.9855 - val_mae: 0.0650 - val_mse: 0.0100 Epoch 2/10 32/32 [==============================] - 19s 585ms/step - loss: 0.0079 - accuracy: 0.9787 - mae: 0.0568 - mse: 0.0079 - val_loss: 0.0063 - val_accuracy: 0.9869 - val_mae: 0.0534 - val_mse: 0.0063 Epoch 3/10 32/32 [==============================] - 18s 575ms/step - loss: 0.0060 - accuracy: 0.9700 - mae: 0.0506 - mse: 0.0060 - val_loss: 0.0045 - val_accuracy: 0.9776 - val_mae: 0.0438 - val_mse: 0.0045 Epoch 4/10 ....
Topic: Machine Learning & AI SubTopic: General Tags:
Dec ’21
Reply to Getting ModuleNotFoundError: No module named 'tensorflow.python.compiler.mlcompute' error
reformatting your code : import tensorflow as tf from tensorflow.python.compiler.mlcompute import mlcompute tf.compat.v1.disable_eager_execution() mlcompute.set_mlc_device(device_name='gpu') print("is_apple_mlc_enabled %s" % mlcompute.is_apple_mlc_enabled()) print("is_tf_compiled_with_apple_mlc %s" % mlcompute.is_tf_compiled_with_apple_mlc()) print(f"eagerly? {tf.executing_eagerly()}") print(tf.config.list_logical_devices()) it look like some seriously old code, just do this instead import tensorflow as tf print(tf.__version__) physical_devices = tf.config.list_physical_devices('GPU') tf.print(physical_devices) ex : 2.7.0 Metal device set to: Apple M1 systemMemory: 8.00 GB maxCacheSize: 2.67 GB 2021-12-20 23:11:09.001976: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-12-20 23:11:09.002466: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Topic: Machine Learning & AI SubTopic: General Tags:
Dec ’21
Reply to TensorFlow with Metal start giving wrong results after upgrading macOS from 12.0.1 to 12.1
wrote a minimal use case, this used to generate 2 different series : import tensorflow as tf x = tf.random.uniform((10,)) y = tf.random.uniform((10,)) tf.print(x) tf.print(y) [0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022] [0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022] works fine on collab : It also works fine if I disable GPU with : tf.config.set_visible_devices([], 'GPU') WORKAROUND : g = tf.random.Generator.from_non_deterministic_state() x = g.uniform((10,)) y = g.uniform((10,)) tf.print(x) tf.print(y)
Topic: Graphics & Games SubTopic: General Tags:
Dec ’21
Reply to TensorFlow with Metal start giving wrong results after upgrading macOS from 12.0.1 to 12.1
I'm still on Epoch 5, on a MacBook Air M1 2020, but it look fine too me. so far. My other trainings run just fine too. look like you just got bad luck on this run ? What about the other intermediary result ? do they all look bad ? edit : I also have some very bad result sometimes, weird. is there a problem with random generation ? i have a model that heavily use random.uniform, I'll check. EDIT again : I need to double check but random is broken in some situation
Topic: Graphics & Games SubTopic: General Tags:
Dec ’21
Reply to Cannot install Tensorflow on Mac m1
I've installed Tensorflow multiple time on Mac M1 using this guide https://developer.apple.com/metal/tensorflow-plugin/ Just follow it step by step, don't skip the miniforge3 installation, it is absolutely mandatory to install and use the one provided in the guide. Tested on python 3.8 and 3.9. Tensorflow is not supported on 3.10 (yet)
Topic: Machine Learning & AI SubTopic: General Tags:
Dec ’21
Reply to Why is it so slow?
This isn't unexpected, on any platform with any device. Sometime the CPU is faster than the GPU. Sometime my M1 on my MacBook Air 13" is faster than my Nvidia Quadro, or a Tesla K80. It depend on the workload. It's not specific to TensorFlow metal. To be 100% sure you disable the GPU in order to test : tf.config.set_visible_devices([], 'GPU')
Topic: Machine Learning & AI SubTopic: General Tags:
Dec ’21
Reply to Odd CPU/GPU behaviour in TF-metal on M1 Pro
We really lack documentation indeed. I had weird case were cpu was faster than gpu too. ^^ I only have the M1 (non pro/max) To fully disable the CPU I use this : tf.config.set_visible_devices([], 'GPU') call it first before doing anything else. You might also want to display what device is used for what operation : tf.debugging.set_log_device_placement(True)) It's very verbose and the 1st step is usually mostly cpu (function tracing). From my experience too : don't use float16 (not faster) and don't use mixed_precision (it fallback to CPU), at least on my M1. Give a try to this option too : physical_devices = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_devices[0],True)
Topic: Machine Learning & AI SubTopic: General Tags:
Dec ’21
Reply to neural engine for model training?
From my understanding and information I gathered here and there over time : the neural engine is inferior to the gpu in every aspect for training a TF model and is ... kind of useless to us, developper ? If I extrapolate from the information I found, it's only useful for the tiny model (per today's standard) like the Apple's OCR (eg : you can copy/paste written in image), speech recognition, touchpad gesture, etc ...
Topic: Machine Learning & AI SubTopic: General Tags:
Dec ’21
Reply to tf.random is broken since Monterey 12.1
Still broken on 12.3... hello apple ?
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
May ’22
Reply to Not able to install tensor flow-macos
how did you install it ?
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Error installing tensorflow-macos
you must you Miniforge3 as stated in the guide, not the regular conda. if pip install do not works just install it with conda install instead
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to tf.random is broken since Monterey 12.1
The workaround doesn't work in a tf.function, this is a real problem. I tried other alternative like : randomgen = tf.random.Generator.from_non_deterministic_state() #%% for _ in range(10): g2 = tf.random.get_global_generator() x = g2.uniform((10,),(1,2)) y = g2.uniform((10,),(3,4)) tf.print(x) tf.print(y) But NotFoundError: No registered 'RngReadAndSkip' OpKernel for 'GPU' devices compatible with node {{node RngReadAndSkip}} . Registered: device='CPU' [Op:RngReadAndSkip] And obviously calling this in a tf.function will always generate the same sequence tf.random.stateless_uniform((size,),(1,2),xmin,xmax,tf.float32) this doesn't works too : randomgen = tf.random.Generator.from_non_deterministic_state() @tf.function def MandelbrotDataSet(size=1000, max_depth=100, xmin=-2.0, xmax=0.7, ymin=-1.3, ymax=1.3): global randomgen x = randomgen.uniform((size,),xmin,xmax,tf.float32) y = randomgen.uniform((size,),xmin,xmax,tf.float32) Because of RngReadAndSkip again.
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Some resource has been exhausted. For example, this error might be raised if a per-user quota is exhausted, or perhaps the entire file system is out of space. @@__init__ 2 root error(s) found. (0) RESOURCE_EXHAUSTED: OOM when allocating
shape[114389,320] ? are you sure you're not doing something wrong here ?
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
It's a perfectly normal and harmless message on a M1. I have it too and my model & code works just fine. 2021-12-20 23:19:04.025952: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-12-20 23:19:04.026364: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) Metal device set to: Apple M1 systemMemory: 8.00 GB maxCacheSize: 2.67 GB __________________________________________________________________________________________________ 2021-12-20 23:19:04.413489: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz Epoch 1/10 2021-12-20 23:19:04.723827: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 32/32 [==============================] - ETA: 0s - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.02562021-12-20 23:19:24.073636: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 32/32 [==============================] - 20s 608ms/step - loss: 0.0256 - accuracy: 0.9605 - mae: 0.0933 - mse: 0.0256 - val_loss: 0.0100 - val_accuracy: 0.9855 - val_mae: 0.0650 - val_mse: 0.0100 Epoch 2/10 32/32 [==============================] - 19s 585ms/step - loss: 0.0079 - accuracy: 0.9787 - mae: 0.0568 - mse: 0.0079 - val_loss: 0.0063 - val_accuracy: 0.9869 - val_mae: 0.0534 - val_mse: 0.0063 Epoch 3/10 32/32 [==============================] - 18s 575ms/step - loss: 0.0060 - accuracy: 0.9700 - mae: 0.0506 - mse: 0.0060 - val_loss: 0.0045 - val_accuracy: 0.9776 - val_mae: 0.0438 - val_mse: 0.0045 Epoch 4/10 ....
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Getting ModuleNotFoundError: No module named 'tensorflow.python.compiler.mlcompute' error
reformatting your code : import tensorflow as tf from tensorflow.python.compiler.mlcompute import mlcompute tf.compat.v1.disable_eager_execution() mlcompute.set_mlc_device(device_name='gpu') print("is_apple_mlc_enabled %s" % mlcompute.is_apple_mlc_enabled()) print("is_tf_compiled_with_apple_mlc %s" % mlcompute.is_tf_compiled_with_apple_mlc()) print(f"eagerly? {tf.executing_eagerly()}") print(tf.config.list_logical_devices()) it look like some seriously old code, just do this instead import tensorflow as tf print(tf.__version__) physical_devices = tf.config.list_physical_devices('GPU') tf.print(physical_devices) ex : 2.7.0 Metal device set to: Apple M1 systemMemory: 8.00 GB maxCacheSize: 2.67 GB 2021-12-20 23:11:09.001976: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-12-20 23:11:09.002466: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Can´t use tensorflow on Macbook Air M1
See this post, this should help, you have exactly the same problem : https://developer.apple.com/forums/thread/696693
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to TensorFlow with Metal start giving wrong results after upgrading macOS from 12.0.1 to 12.1
wrote a minimal use case, this used to generate 2 different series : import tensorflow as tf x = tf.random.uniform((10,)) y = tf.random.uniform((10,)) tf.print(x) tf.print(y) [0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022] [0.178906798 0.8810848 0.384304762 ... 0.162458301 0.64780426 0.0123682022] works fine on collab : It also works fine if I disable GPU with : tf.config.set_visible_devices([], 'GPU') WORKAROUND : g = tf.random.Generator.from_non_deterministic_state() x = g.uniform((10,)) y = g.uniform((10,)) tf.print(x) tf.print(y)
Topic: Graphics & Games SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to TensorFlow with Metal start giving wrong results after upgrading macOS from 12.0.1 to 12.1
I'm still on Epoch 5, on a MacBook Air M1 2020, but it look fine too me. so far. My other trainings run just fine too. look like you just got bad luck on this run ? What about the other intermediary result ? do they all look bad ? edit : I also have some very bad result sometimes, weird. is there a problem with random generation ? i have a model that heavily use random.uniform, I'll check. EDIT again : I need to double check but random is broken in some situation
Topic: Graphics & Games SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to TensorFlow with Metal start giving wrong results after upgrading macOS from 12.0.1 to 12.1
I upgraded to 12.1 today. I just launched a DCGAN, I'll let you know. BUT, I have other model in training (an autoencoder) and haven't noticed any difference since yesterday.
Topic: Graphics & Games SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Cannot install Tensorflow on Mac m1
I've installed Tensorflow multiple time on Mac M1 using this guide https://developer.apple.com/metal/tensorflow-plugin/ Just follow it step by step, don't skip the miniforge3 installation, it is absolutely mandatory to install and use the one provided in the guide. Tested on python 3.8 and 3.9. Tensorflow is not supported on 3.10 (yet)
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Why is it so slow?
This isn't unexpected, on any platform with any device. Sometime the CPU is faster than the GPU. Sometime my M1 on my MacBook Air 13" is faster than my Nvidia Quadro, or a Tesla K80. It depend on the workload. It's not specific to TensorFlow metal. To be 100% sure you disable the GPU in order to test : tf.config.set_visible_devices([], 'GPU')
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to Odd CPU/GPU behaviour in TF-metal on M1 Pro
We really lack documentation indeed. I had weird case were cpu was faster than gpu too. ^^ I only have the M1 (non pro/max) To fully disable the CPU I use this : tf.config.set_visible_devices([], 'GPU') call it first before doing anything else. You might also want to display what device is used for what operation : tf.debugging.set_log_device_placement(True)) It's very verbose and the 1st step is usually mostly cpu (function tracing). From my experience too : don't use float16 (not faster) and don't use mixed_precision (it fallback to CPU), at least on my M1. Give a try to this option too : physical_devices = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_devices[0],True)
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21
Reply to neural engine for model training?
From my understanding and information I gathered here and there over time : the neural engine is inferior to the gpu in every aspect for training a TF model and is ... kind of useless to us, developper ? If I extrapolate from the information I found, it's only useful for the tiny model (per today's standard) like the Apple's OCR (eg : you can copy/paste written in image), speech recognition, touchpad gesture, etc ...
Topic: Machine Learning & AI SubTopic: General Tags:
Replies
Boosts
Views
Activity
Dec ’21