Post

Replies

Boosts

Views

Activity

Activating a Container App from a Custom Keyboard Extension to Enable Continuous Voice Input While Preserving the Original Typing Context
Project Background: I am developing a third-party custom keyboard for iOS whose primary feature is real-time voice input. In my current design, responsibilities are split as follows: 1. The container (main) app is responsible for: Audio recording Speech recognition (ASR) 2. The keyboard extension is responsible for: Providing the keyboard UI Initiating the voice input workflow Receiving transcription results via an App Group Inserting recognized text into the active text field using textDocumentProxy.insertText(_:) Intended User Flow The intended workflow is: The user is typing in a third-party app (for example, WeChat) using my custom keyboard. The user taps a “Voice Input” button in the keyboard extension. The keyboard extension activates the container app so that audio recording and ASR can begin. After recording has started, control returns to the original app where the user was typing. The container app continues running in the background, maintaining active audio recording and ASR. Recognized text is continuously streamed back to the keyboard extension and inserted into the current cursor position in real time. Observed Industry Behavior Some popular third-party keyboards on iOS, such as WeChat Keyboard and Doubao Keyboard, appear to provide a similar user experience in which: Voice input can be initiated directly from the keyboard while typing in another app. The user remains (or returns) in the original typing context after voice input starts. Speech recognition continues and text is streamed into the active text field without interrupting the typing experience. I would like to better understand how this type of workflow aligns with iOS platform capabilities and supported APIs. My Questions Is it supported by iOS public APIs for a custom keyboard extension to activate its container app to start audio recording and ASR, then return to the original host app while the container app continues recording and performing ASR in the background? If this workflow is not supported, are there any Apple-recommended or supported alternative architectures for achieving a similar user experience, especially when audio recording and ASR logic are currently implemented in the container app rather than in the keyboard extension? Goal My goal is to design a solution that is fully compliant with iOS public APIs and platform constraints, while providing a real-time voice input experience comparable to existing third-party keyboards on the platform. Any guidance on supported APIs, recommended architectures, or relevant documentation would be greatly appreciated.
3
0
125
3w