However, if you would like to find the position on screen for use with something such as UIKit or ARView.ray(through:) further transformation is required.
The Vision request was performed on arView.session.currentFrame.capturedImage.
arView.session.currentFrame is an ARFrame.
From the documentation on ARFrame.displayTransform(for:viewportSize:):
Normalized image coordinates range from (0,0) in the upper left corner of the image to (1,1) in the lower right corner. This method creates an affine transform representing the rotation and aspect-fit crop operations necessary to adapt the camera image to the specified orientation and to the aspect ratio of the specified viewport. The affine transform does not scale to the viewport's pixel size. The capturedImage pixel buffer is the original image captured by the device camera, and thus not adjusted for device orientation or view aspect ratio.
So the image being rendered on screen is a cropped version of the frame that the camera captures, and there is transformation needed to go from AVFoundation coordinates to display (UIKit) coordinates.
Converting from AVFoundation coordinates to display (UIKit) coordinates:
public extension ARView {
func convertAVFoundationToScreenSpace(_ point: CGPoint) -> CGPoint? {
//Convert from normalized AVFoundation coordinates (0,0 top-left, 1,1 bottom-right)
//to screen-space coordinates.
guard
let arFrame = session.currentFrame,
let interfaceOrientation = window?.windowScene?.interfaceOrientation
else {return nil}
let transform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size)
let normalizedCenter = point.applying(transform)
let center = normalizedCenter.applying(CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height))
return center
}
}
To go the opposite direction, from UIKit display coordinates to AVFoundation coordinates:
public extension ARView {
func convertScreenSpaceToAVFoundation(_ point: CGPoint) -> CGPoint? {
//Convert to normalized pixel coordinates (0,0 top-left, 1,1 bottom-right)
//from screen-space UIKit coordinates.
guard
let arFrame = session.currentFrame,
let interfaceOrientation = window?.windowScene?.interfaceOrientation
else {return nil}
let inverseScaleTransform = CGAffineTransform.identity.scaledBy(x: frame.width, y: frame.height).inverted()
let invertedDisplayTransform = arFrame.displayTransform(for: interfaceOrientation, viewportSize: frame.size).inverted()
let unScaledPoint = point.applying(inverseScaleTransform)
let normalizedCenter = unScaledPoint.applying(invertedDisplayTransform)
return normalizedCenter
}
}
To get a world-space coordinate from a UIKit screen coordinate and a corresponding depth value:
/// Get the world-space position from a UIKit screen point and a depth value
/// - Parameters:
/// - screenPosition: A CGPoint representing a point on screen in UIKit coordinates.
/// - depth: The depth at this coordinate, in meters.
/// - Returns: The position in world space of this coordinate at this depth.
private func worldPosition(screenPosition: CGPoint, depth: Float) -> simd_float3? {
guard
let rayResult = arView.ray(through: screenPosition)
else {return nil}
//rayResult.direction is a normalized (1 meter long) vector pointing in the correct direction, and we want to go the length of depth along this vector.
let worldOffset = rayResult.direction * depth
let worldPosition = rayResult.origin + worldOffset
return worldPosition
}
To set the position of an entity in world space for a given point on screen:
let currentFrame = arView.session.currentFrame,
let sceneDepth = (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap
let depthAtPoint = sceneDepth.value(from: avFoundationPosition),
let worldPosition = worldPosition(screenPosition: uiKitPosition, depth: depthAtPoint)
trackedEntity.setPosition(worldPosition, relativeTo: nil)
And don't forget to set the proper frameSemantics on your ARConfiguration:
func runNewConfig(){
// Create a session configuration
let configuration = ARWorldTrackingConfiguration()
//Goes with (currentFrame.smoothedSceneDepth ?? currentFrame.sceneDepth)?.depthMap
let frameSemantics: ARConfiguration.FrameSemantics = [.smoothedSceneDepth, .sceneDepth]
//Goes with currentFrame.estimatedDepthData
//let frameSemantics: ARConfiguration.FrameSemantics = .personSegmentationWithDepth
if ARWorldTrackingConfiguration.supportsFrameSemantics(frameSemantics) {
configuration.frameSemantics.insert(frameSemantics)
}
// Run the view's session
session.run(configuration)
}