Help decoding H264 with Video Toolbox

Question

Created Feb ’21

Replies 2

Boosts 0

Views 2.5k

Participants 2

I am trying to get an H264 streaming app working on various platforms using a combination of Apple Video Toolbox and OpenH264. There is one use-case that doesn't work and I can't find any solution. When the source uses video Toolbox on a 2011 iMac running MacOS High Sierra and the receiver is a MacBook pro running Big Sur.

On the receiver the decoded image is about 3/4 green. If I scale the image down to about 1/8 of original before encoding then it works fine. If I capture the frames on the MacBook and then run exactly the same decoding software in a test program on the iMac then it decodes fine. Doing the same on the Macbook (same image of test program) give 3/4 green again.

I have a similar problem when receiving from an OpenH264 encoder on a slower Windows machine. I suspect that this has something to do with temporal processing, but really don't understand H264 well enough to work it out. One thing that I did notice is that the decode call returns with no error code but a NULL pixel buffer about 70% of the time.

The "guts" of the decoding part looks like this (modified from a demo on GitHub)

Code Block Objective-cvoid didDecompress(void *decompressionOutputRefCon, void *sourceFrameRefCon, OSStatus status, VTDecodeInfoFlags infoFlags, CVImageBufferRef pixelBuffer, CMTime presentationTimeStamp, CMTime presentationDuration )
{
    CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
    *outputPixelBuffer = CVPixelBufferRetain(pixelBuffer);
}
 void initVideoDecodeToolBox ()
    {
        if (!decodeSession)
        {
            const uint8_t* parameterSetPointers[2] = { mSPS, mPPS };
            const size_t parameterSetSizes[2] = { mSPSSize, mPPSSize };
            OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets(kCFAllocatorDefault,2, //param count
                                                                                  parameterSetPointers,
                                                                                  parameterSetSizes,
                                                                                  4, //nal start code size
                                                                                  &formatDescription);
            if(status == noErr)
            {
                CFDictionaryRef attrs = NULL;
                const void *keys[] = { kCVPixelBufferPixelFormatTypeKey, kVTDecompressionPropertyKey_RealTime };
                uint32_t v = kCVPixelFormatType_32BGRA;
                const void *values[] = { CFNumberCreate(NULL, kCFNumberSInt32Type, &v), kCFBooleanTrue };
                attrs = CFDictionaryCreate(NULL, keys, values, 2, NULL, NULL);
                VTDecompressionOutputCallbackRecord callBackRecord;
                callBackRecord.decompressionOutputCallback = didDecompress;
                callBackRecord.decompressionOutputRefCon = NULL;
                status = VTDecompressionSessionCreate(kCFAllocatorDefault, formatDescription, NULL, attrs, &callBackRecord, &decodeSession);
                CFRelease(attrs);
            }
            else
            {
                NSLog(@"IOS8VT: reset decoder session failed status=%d", status);
            }
        }
    }
CVPixelBufferRef decode ( const char *NALBuffer, size_t NALSize )
    {
        CVPixelBufferRef outputPixelBuffer = NULL;
        if (decodeSession && formatDescription )
        {
            // The NAL buffer has been stripped of the NAL length data, so this has to be put back in
            MemoryBlock buf ( NALSize + 4);
            memcpy ( (char*)buf.getData()+4, NALBuffer, NALSize );
            *((uint32*)buf.getData()) = CFSwapInt32HostToBig ((uint32)NALSize);
            
            CMBlockBufferRef blockBuffer = NULL;
            OSStatus status  = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, buf.getData(), NALSize+4,kCFAllocatorNull,NULL, 0, NALSize+4, 0, &blockBuffer);
            
            if(status == kCMBlockBufferNoErr)
            {
                CMSampleBufferRef sampleBuffer = NULL;
                const size_t sampleSizeArray[] = {NALSize + 4};
                status = CMSampleBufferCreateReady(kCFAllocatorDefault,blockBuffer,formatDescription,1, 0, NULL, 1, sampleSizeArray,&sampleBuffer);
                
                if (status == kCMBlockBufferNoErr && sampleBuffer)
                {
                    VTDecodeFrameFlags flags = 0;VTDecodeInfoFlags flagOut = 0;
                    
                    // The default is synchronous operation.
                    // Call didDecompress and call back after returning.
                    OSStatus decodeStatus = VTDecompressionSessionDecodeFrame ( decodeSession, sampleBuffer, flags, &outputPixelBuffer, &flagOut );
                    if(decodeStatus != noErr)
                    {
                        DBG ( "decode failed status=" + String ( decodeStatus) );
                    }
                    CFRelease(sampleBuffer);
                }
                CFRelease(blockBuffer);
            }
        }
        return outputPixelBuffer;
    }

Boost

Answer 1

Nelroy OP

Feb ’21

Well, I finally found the answer so I'm going to leave it here for posterity. It turns out that the Video Toolkit decode function expects the NAL blocks that all belong to the same frame to be copied into a single SampleBuffer. The older Mac is providing the app with single keyframes that are split into separate NAL blocks which the app then sends individually across the network.

Unfortunately this means that the first NAL block will be processed, in may case less than a quarter of the picture, and the rest will be discarded. What you need to do is work out which NALs are part of the same frame, and bundle them together. Unfortunately this requires you to partially parse the PPS and the frames themselves, which is not trivial.

Many thanks to the post here which put me on the right track.

1

Answer 2

TJ_Spective_Labs OP

Dec ’24

I am having a quite a time getting my implementation running as well. About 30 hours in and have watched WWDC2014 #513 many times 🫣. I think David mentioned the grouping of the NAL blocks in the session.

0