ClementTong’s Profile | Apple Developer Forums

ClementTong

From Hong Kong SAR, China

Topics

Apple Intelligence & Machine Learning

Post

Replies

Boosts

Views

Activity

Foundational Model - Image as Input? Timeline

Hi all, I am interested in unlocking unique applications with the new foundational models. I have a few questions regarding the availability of the following features: Image Input: The update in June 2025 mentions "image" 44 times (https://machinelearning.apple.com/research/apple-foundation-models-2025-updates) - however I can't seem to find any information about having images as the input/prompt for the foundational models. When will this be available? I understand that there are existing Vision ML APIs, but I want image input into a multimodal on-device LLM (VLM) instead for features like "Which player is holding the ball in the image", etc (image understanding) Cloud Foundational Model - when will this be available? Thanks! Clement :)

Machine Learning & AI Foundation Models Vision Machine Learning Core ML Apple Intelligence

709

Sep ’25

Foundational Model - Image as Input? Timeline

Machine Learning & AI Foundation Models Vision Machine Learning Core ML Apple Intelligence

Replies: 1
Boosts: 0
Views: 709
Activity: Sep ’25