Try increasing the batch_size parameter in the above code: 128 typically is too low for decent performance with a GPU (though platform dependent).
On a previous gen MacBook Pro with AMD Radeon Pro 5500M, with batch_size 4096, I get a 2 s/epoch with the GPU, compared to 8 s/epoch using the CPU.
Topic:
Machine Learning & AI
SubTopic:
General
Tags: