Hi,
I've found a memory leak issue when using the tensorFlow-metal plugin for running a deep learning model on a Mac with the M1 chip. Here are the details of my system:
System Information
MacOS version: 13.4
TensorFlow (macos) version: 2.12.0, 2.13.0-rc1, tf-nightly==2.14.0.dev20230616
TensorFlow-Metal Plugin Version: 0.8, 1.0.0, 1.0.1
Model Details
I've implemented a custom model architecture using TensorFlow's Keras API. The model has a dynamic Input, which I resize the images in a Resizing layer. Moreover, the data is passed to the model through a data generator class, using model.fit().
Problem Description
When I train this model using the GPU on M1 Mac, I observe a continuous increase in memory usage, leading to a memory leak. This memory increase is more prominent with larger image inputs. For smaller images or average sizes (1024x128), the increase is smaller, but continuous, leading to a memory leak after several epochs.
On the other hand, when I switch to using the CPU for training (tf.config.set_visible_devices([], 'GPU')), the memory leak issue is resolved, and I observe normal memory usage. In addition, I've tested the model with different sizes of images and various layer configurations. The memory leak appears to be present only when using the GPU, indeed.
I hope this information is helpful in identifying and resolving the issue. If you need any further details, please let me know. The project code is private, but I can try to provide it with pseudocode if necessary.
5
2
2.7k