Background: I am prototyping with RealityKit with ios 14.1 on a latest iPad Pro 11 inches. My goal was to track a hand. When using skeleton tracking, it appears skeleton scales were not adjusted correctly so I got like 15cm off in some of my samples. So I am experimenting to use Vision to identity hand and then project back into 3D space.
1> Run image recognition on ARFrame.capturedImage
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .up, options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
....
try handler.perform([handPoseRequest])
2> Convert point to 3D world transform (where the problem is).
fileprivate func convertVNPointTo3D(_ point: VNRecognizedPoint,
_ session: ARSession,
_ frame: ARFrame,
_ viewSize: CGSize)
-> Transform?
{
let pointX = (point.x / Double(frame.camera.imageResolution.width))*Double(viewSize.width)
let pointY = (point.y / Double(frame.camera.imageResolution.height))*Double(viewSize.height)
let query = frame.raycastQuery(from: CGPoint(x: pointX, y: pointY), allowing: .estimatedPlane, alignment: .any)
let results = session.raycast(query)
if let first = results.first {
return Transform(matrix: first.worldTransform)
}
else {
return nil
}
}
I wonder if I am doing the right conversion. The issue is, in the ARSession.rayCast document - https://developer.apple.com/documentation/arkit/arsession/3132065-raycast, it says this is converting UI screen point to 3D point. However, I am not sure how ARFrame.capturedImage will be fit into UI screen.
Thanks
2
0
1.7k