dbl001’s Profile | Apple Developer Forums

Deep Learning Chapter 10: Advanced use of recurrent neural networks not Using GPU

I am running a recurrent neural network example on an iMac 27" with an AMD Radeon Pro 5700 XT on OS X 12.3. The code runs, the GPU was initialized the GPU is not active. Each epoch takes: 819/819 [==============================] - 32417s 40s/step - loss: 121.7538 - mae: 8.9641 - val_loss: 100.3145 - val_mae: 8.0313 % python chapter-10.py ['"Date Time"', '"p (mbar)"', '"T (degC)"', '"Tpot (K)"', '"Tdew (degC)"', '"rh (%)"', '"VPmax (mbar)"', '"VPact (mbar)"', '"VPdef (mbar)"', '"sh (g/kg)"', '"H2OC (mmol/mol)"', '"rho (g/m**3)"', '"wv (m/s)"', '"max. wv (m/s)"', '"wd (deg)"'] 420451 num_train_samples: 210225 num_val_samples: 105112 num_test_samples: 105114 2022-03-28 18:28:59.988516: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Metal device set to: AMD Radeon Pro 5700 XT systemMemory: 128.00 GB maxCacheSize: 7.99 GB 2022-03-28 18:28:59.989242: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2022-03-28 18:28:59.989616: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>) WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. Epoch 1/50 2022-03-28 18:29:02.342296: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 819/819 [==============================] - ETA: 0s - loss: 121.7538 - mae: 8.9641 2022-03-29 03:21:02.092397: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled. 819/819 [==============================] - 32417s 40s/step - loss: 121.7538 - mae: 8.9641 - val_loss: 100.3145 - val_mae: 8.0313 Epoch 2/50 381/819 [============>.................] - ETA: 4:50:31 - loss: 93.7597 - mae: 7.6880 from tensorflow import keras from tensorflow.keras import layers num_features = 14 steps = 120 import os fname = os.path.join("jena_climate_2009_2016.csv") with open(fname) as f: data = f.read() lines = data.split("\n") header = lines[0].split(",") lines = lines[1:] print(header) print(len(lines)) import numpy as np temperature = np.zeros((len(lines),)) raw_data = np.zeros((len(lines), len(header) - 1)) for i, line in enumerate(lines): values = [float(x) for x in line.split(",")[1:]] temperature[i] = values[1] raw_data[i, :] = values[:] sampling_rate = 6 sequence_length = 120 delay = sampling_rate * (sequence_length + 24 - 1) batch_size = 256 num_train_samples = int(0.5 * len(raw_data)) num_val_samples = int(0.25 * len(raw_data)) num_test_samples = len(raw_data) - num_train_samples - num_val_samples print("num_train_samples:", num_train_samples) print("num_val_samples:", num_val_samples) print("num_test_samples:", num_test_samples) train_dataset = keras.utils.timeseries_dataset_from_array( raw_data[:-delay], targets=temperature[delay:], sampling_rate=sampling_rate, sequence_length=sequence_length, shuffle=True, batch_size=batch_size, start_index=0, end_index=num_train_samples) val_dataset = keras.utils.timeseries_dataset_from_array( raw_data[:-delay], targets=temperature[delay:], sampling_rate=sampling_rate, sequence_length=sequence_length, shuffle=True, batch_size=batch_size, start_index=num_train_samples, end_index=num_train_samples + num_val_samples) test_dataset = keras.utils.timeseries_dataset_from_array( raw_data[:-delay], targets=temperature[delay:], sampling_rate=sampling_rate, sequence_length=sequence_length, shuffle=True, batch_size=batch_size, start_index=num_train_samples + num_val_samples) inputs = keras.Input(shape=(steps, num_features)) x = layers.SimpleRNN(16, return_sequences=True)(inputs) x = layers.SimpleRNN(16, return_sequences=True)(x) outputs = layers.SimpleRNN(16)(x) inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1])) x = layers.LSTM(32, recurrent_dropout=0.25)(inputs) x = layers.Dropout(0.5)(x) outputs = layers.Dense(1)(x) model = keras.Model(inputs, outputs) callbacks = [ keras.callbacks.ModelCheckpoint("jena_lstm_dropout.keras", save_best_only=True) ] model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"]) history = model.fit(train_dataset, epochs=50, validation_data=val_dataset, callbacks=callbacks) Any idea why the GPU is not active? How does this code example run on an M1 Ultra?

Machine Learning & AI General tensorflow-metal

2

0

1.4k

Mar ’22

Training Top2vec Model Crashed OS X 12.3.1

Training Top2vec with embedding_batch_size=256 crashed OS X 12.3.1 tensorflow_macos 2.8.0, tensorflow_metal 0.4.0 Anaconda Python 3.8.5 % pip show tensorflow_macos WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-macos Version: 2.8.0 Summary: TensorFlow is an open source machine learning framework for everyone. Home-page: https://www.tensorflow.org/ Author: Google Inc. Author-email: packages@tensorflow.org License: Apache 2.0 Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: absl-py, astunparse, flatbuffers, gast, google-pasta, grpcio, h5py, keras, keras-preprocessing, libclang, numpy, opt-einsum, protobuf, setuptools, six, tensorboard, termcolor, tf-estimator-nightly, typing-extensions, wrapt Required-by: (tensorflow-metal) (base) davidlaxer@x86_64-apple-darwin13 top2vec % pip show tensorflow_metal WARNING: Ignoring invalid distribution -umpy (/Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages) Name: tensorflow-metal Version: 0.4.0 Summary: TensorFlow acceleration for Mac GPUs. Home-page: https://developer.apple.com/metal/tensorflow-plugin/ Author: Author-email: License: MIT License. Copyright © 2020-2021 Apple Inc. All rights reserved. Location: /Users/davidlaxer/tensorflow-metal/lib/python3.8/site-packages Requires: six, wheel Required-by: To train the model with embedding_model="universal-sentence-encoder", you'll have to download universal-sentence-encoder_4. top2vec_trained = Top2Vec(documents=papers_filtered_df.text.tolist(), split_documents=True, **embedding_batch_size=256,** embedding_model="universal-sentence-encoder", use_embedding_model_tokenizer=True, embedding_model_path="/Users/davidlaxer/Downloads/universal-sentence-encoder_4", workers=8) Here's the project: https://github.com/ddangelov/Top2Vec Here's the Jupyter notebook: https://github.com/ddangelov/Top2Vec/blob/master/notebooks/CORD-19_top2vec.ipynb You'll have to load the COVID-19 data set from Kaggle here: https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge I set filter size to 1,000: def filter_short(papers_df): papers_df["token_counts"] = papers_df["text"].str.split().map(len) papers_df = **papers_df[papers_df.token_counts>1000].reset_index(drop=True)** papers_df.drop('token_counts', axis=1, inplace=True) return papers_df Trace panic(cpu 8 caller 0xffffff8020d449ad): userspace watchdog timeout: no successful checkins from WindowServer in 120 seconds service: logd, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 seconds ago service: WindowServer, total successful checkins since wake (127621 seconds ago): 12751, last successful checkin: 120 seconds ago service: remoted, total successful checkins since wake (127621 seconds ago): 12763, last successful checkin: 0 [Trace](https://developer.apple.com/forums/content/attachment/d17c2c9b-569b-4c1a-9c61-892ced7f785b)

Machine Learning & AI General tensorflow-metal

2

0

1.6k

Jun ’22

Will Metal Performance Shaders Framework Support Float64?

Will Metal Performance Shaders Framework ('MPS') support Float64? https://developer.apple.com/documentation/metalperformanceshaders

Graphics & Games General Metal Metal Performance Shaders tensorflow-metal

2

0

1.7k

Jul ’22

Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY

[I am running Top2vec on Big Sur 11.6 with tensorflow-macos and tensorflow-metal. Python crashed ... linkText Crashed Thread: 0 Dispatch queue: metal gpu stream Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Exception Note: EXC_CORPSE_NOTIFY Application Specific Information: /System/Volumes/Data/SWE/macOS/BuildRoots/38cf1d983f/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MetalPerformanceShaders-124.6.1/MPSCore/Utility/MPSCommandBufferImageCache.mm:1386: failed assertion `Failed to allocate private MTLBuffer for size 421888000 Crash Log

Machine Learning & AI General tensorflow-metal

3

0

2.5k

Sep ’22

Introducing Accelerated PyTorch Training on Mac in v1.12

https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac Does this feature support AMD GPUs with Metal or only M1 support? Does v1.12 nightly build support the Apple Metal only with source?

Graphics & Games General Metal

3

0

4.5k

Jun ’23

[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x600000eede10

I am running tensorflow-macos and tensorflow-metal version 2.6 on Monterey Beta (21A5543b) on an iMac 27" 2021 with an AMD Radeon GPU. I got the following error training the model VariationalDeepSemanticHashing e.g. 2021-10-09 13:05:14.521286: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-10-09 13:05:27.092823: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled. 2021-10-09 13:05:27.153 python[6315:1459657] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x600000eede10 [I 2021-10-09 13:05:28.157 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports kernel d25e6066-74f7-4b4a-b5e7-b2911e7501d9 restarted https://github.com/unsuthee/VariationalDeepSemanticHashing/blob/master/Run_Experiment_Unsupervised.ipynb Here's the repository: https://github.com/unsuthee/VariationalDeepSemanticHashing

Machine Learning & AI General tensorflow-metal

22

0

8.3k

Jan ’23

dbl001

Post

Replies

Boosts

Views

Activity