Thanks @Vision Pro Engineer for the answer. I was hoping for something closer to "you need to call this function and it will be ok" ;).
My environment is simple and static - I have a physical object (standing on my desk) with fiducial tag glued to it and I want to overlay a 3D model (which I call here entity) over this physical object. I'm sitting at my desk, cameras are not obstructed, light is good (and during the day there's also a lot of sunlight), I'm not using travel mode and the object is 50-60cm from me (and I do have Apple's enterprise blessings to access the cameras :).
I did another test today - I calculated the pose only once and just displayed the 3D model at that point. No continuous recalculations of its pose (the tracked object is not moving right now anyway). I tried to push and pull the entity along the Z axis a bit, so it's nearer/farther away from me, but it does not seem to affect this effect (so it's not a parallax). The entity always gets more displaced from its correct coordinates, the closer it gets to the edge of my field of view. If I place it perfectly in the center, the position is ok, then if I rotate my head to the right (so the entity gets close to left edge of what I see in passthrough), the entity gets more and more displaced to the right. Same with left and up/down - the displacement follows my head movement.
I did record it through the standard "record my view" and this effect seems to be even stronger on the recording, so I feel that it has something to do with magic you do between raw camera input and what displays are showing - magic is not applied to recording, so effect is stronger and in passthrough it corrects, but not enough to be perfect. So maybe it's not the entity that's drifting, but the image of physical object in passthrough gets displaced? Or I'm delusional, which is also possible :D.
I'll file a bug report with a video and post the number here, just need to prepare it on something I can share.