I may have come up with a solution for now. I closer into SFSpeechRecognitionResult -> SFSpeechRecognitionMetadata and saw that there was a variable 'speechDuration'.
Turns out that speechDuration will spit out how long the previous utterance was. And while speech is coming in it will default to nil. So with that, I created another published var "accumulatedTranscript" and checked to see if speechDuration != nil then append whatever the current transcript is, then reset the transcript to an empty string (to clear out the UI's text).
For the UI I'm using a combined var of accumulatedTranscript + transcript to give the appearance of a continuous stream of text. And from my screenshots you can see it will use the last transcript/final result that comes in after the pause
Some things worth noting:
I haven't seen iOS17 display a non-nil speech duration so this solution shouldn't affect how iOS17 works but there may be some edge cases I'm not able to think of now.
The new transcript appended will begin with a capital letter, you'll want to deal with this however you need to for your app (for me, I'll just make everything past the first word lowercase since the pause timer is finicky).
I haven't done a robust test of this solution yet but I've tested on iOS18 simulator and physical device and iOS17 simulator only
I'm not sure how this workaround will affect any changes Apple might make to address this so, you know, keep that in mind.