I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance.
Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance.
Any suggestions would be welcome. This could also be considered a request for a future enhancement.