What I'm trying to do (simplifying a bit) is blend multiple color layers using my own blending functions into a "master" texture.
So what we have is a fragment shader taking in a bunch of inputs, among them the master texture for reading, and outputting to that same master texture as an output color attachment.
You can do your own blending in a fragment shader without this complexity. A fragment shader can read the current value of the framebuffer ("color attachment") pixel without having to make it an input texture (on iOS). In OpenGL ES, this is defined by GL_EXT_shader_framebuffer_fetch (https://registry.khronos.org/OpenGL/extensions/EXT/EXT_shader_framebuffer_fetch.txt). In metal, I believe that you need to add [[color(0)]] to a fragment shader input - see table 5.5 of the spec at https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf - but I've never tried this myself.