Is it possible to live render CMTaggedBuffer / MV-HEVC frames in visionOS?

Question

Luuis OP

Created Aug ’25

Replies 2

Boosts 0

Participants 3

Hey all,

I'm working on a visionOS app that captures live frames from the left and right cameras of Apple Vision Pro using cameraFrame.sample(for: .left/.right).

Apple provides documentation on encoding side-by-side frames into MV-HEVC spatial video using CMTaggedBuffer:

Converting Side-by-Side 3D Video to MV-HEVC

My question: Is there any way to render tagged frames (e.g. CMTaggedBuffer with .stereoView(.leftEye/.rightEye)) live, directly to a surface in RealityKit or Metal, without saving them to a file? I’d like to create a true stereoscopic (spatial) live video preview, not just render two images side-by-side.

Any advice or insights would be greatly appreciated!

Boost

Answer 1

Vision Pro Engineer OP

Apple

Aug ’25

Hi @Luuis: Yes, you can live-render tagged buffers as you describe.

In addition to the sample you referenced, you might also consider Converting projected video to Apple Projected Media Profile. This sample demonstrates how to bisect frames from a stereo side-by-side input file and appends tagged buffers to an output file.

Where the APMP sample uses AVAssetWriterInput.TaggedPixelBufferGroupReceiver, you could instead enqueue sample buffers to an AVSampleBufferVideoRenderer. Both VideoPlayerComponent and VideoMaterial can be used with AVSampleBufferVideoRenderer.

I'll close with a snippet that demonstrates creation of a sample buffer from individual pixel buffers.

// 1a. Create the tagged buffer for the left eye
let tags: [CMTag] = [
    .mediaType(.video), // CMFormatDescription.MediaType
    .stereoView(.leftEye), // CMStereoViewComponents
    .videoLayerID(Int64(0)) 
]
let leftTaggedBuffer = CMTaggedDynamicBuffer(
    tags: tags,
    content: .pixelBuffer(CVReadOnlyPixelBuffer(leftPixelBuffer))
)

// 1b. Adapt the above & repeat for the right eye ...

// 2. Collect the tagged buffers, presentation timestamp, and duration
let taggedBuffers = [leftTaggedBuffer, rightTaggedBuffer]
let presentationTimeStamp: CMTime // derive from the input buffers
let duration = CMTime // derive from the input buffers

// 3. Assemble the sample buffer
let buffer = CMReadySampleBuffer(
    taggedBuffers: taggedBuffers,
    formatDescription: CMTaggedBufferGroupFormatDescription(taggedBuffers: taggedBuffers),
    presentationTimeStamp: presentationTimeStamp,
    duration: duration
)

Please let us know if you have questions, or need additional information.

Best, Steve

0

Answer 2

andromeda OP

2w

Take a look at his sample code: https://developer.apple.com/documentation/realitykit/rendering-stereoscopic-video-with-realitykit

0