Just posting back here as I got all this working in the end.
In case it's useful, here are the stumbling blocks I encountered. Probably, these are just more a reflection of my lack of understanding but maybe it'll help someone.
To construct AV1 Codec Configuration Box outside of FFmpeg etc, then this describes the structure:
https://aomediacodec.github.io/av1-isobmff/#av1codecconfigurationbox-section
The information needed comes from parsing the Sequence Header OBU:
https://aomediacodec.github.io/av1-spec/#general-sequence-header-obu-syntax
If you're writing from scratch (i.e. not. using ffmpeg or whatever), then you need to write or find code to parse the sequence header OBU.
Once you've written the 4 bytes described in 1. then you also need to append the sequence header OBU data block to the end of the block. If you don't, then the decoder setup will fail.
This is then added to the extensions dictionary, along with all the other basic information needed to initialise the decoder (the Chrome references detail all this information).
You then create the video format description using CMVideoFormatDescriptionCreate, passing in the extensions.
I then got caught out with a decode error because I didn't realise that I also had to pass in the Sequence Header OBU with the first frame data I attempted to decode. It wasn't enough that I had already given the same Sequence Header OBU when creating the video format description (via the extensions).
After that it worked.
Decoding itself is slightly simpler than with HEVC, in that you don't need to parse the OBUs, you just pass the data straight to the decoder. With HEVC, you had to parse the NALUs and only pass in slice segments, while also doing some minor conversion of the way the NALU's length is presented to the decoder.
It would be helpful, Apple, if you could consider writing something like CMVideoFormatDescriptionCreateFromAV1SequenceHeaderOBU similar to the existing CMVideoFormatDescriptionCreateFromH264ParameterSets and CMVideoFormatDescriptionCreateFromHEVCParameterSets.
This would lower the bar a little to AV1 hardware decoding.