I am trying to emulate double precision real math operations for HPC using Mac GPUs and Metal2, but I can't find any complete documentation on Metal2 32-bit instructions.
I'm interested in any operations that would aid double precision emulation - for example, are there separate 32-bit multiply hi and multiply low commands? Is there any mechanism for adding big numbers like an add with carry instruction? (Unfortunately, the C language lacks any such mechanisms.)
I would be thrilled to find a way to implement double precision with only about a 4x performance penalty, so I'm going for integer instructions rather than a "double double" approach. Any slower than that, and it's really only worth using AMX.
Any pointers to existing documentation would be greatly appreciated.
Thanks for the help.
Jeff
Selecting any option will automatically load the page