During the keynote, we saw an amazing example of Siri using Visual Intelligence to identify items in the user's physical space and make inferences based on their size.
Do 3rd party apps have the ability to perform this same, or similar actions?
For example:
- User loads a photo of an item or product and clicks a button that says 'Find Item In My Space'.
- Apple Intelligence is then used to analyze the user's surroundings, and notify the user if the item is present or not present, along with some positional or physical context.
- Response is shown on the user interface as text, "This item is in your room, 1 meter to your right."
Goal:
Developers currently can not access the Passthrough Camera on Apple Vision Pro to run AI/ML vision processing models on, for privacy reasons. If Apple Intelligence can look through the camera for the developer, in a privacy-preserving, isolated black box, without providing the image texture to the developer in any way, the user can make use Visual Intelligence features based on their physical surroundings without sacrificing their privacy.
Purpose:
Visual Intelligence is a key feature for that exemplifies the benefits of Spatial Computing, and examples like the one shown in the Keynote are a perfect use-case for the medium. Since Siri now has this capability, users will come to expect that all apps across VisionOS will be able to perform the same kinds of actions. Developers don't generally want or need direct access to the images of a user's surroundings, and having a local/private method of processing these requests is ideal both for developers concerned with data privacy management and users concerned with developers having too much access to their surroundings.
Wearable devices with cameras are a foundational accelerator to users adopting AI in useful ways for their daily life. It is the most natural way to communicate with AI about what is relevant to you at any given time, removes the friction/difficulty of manually scanning good data for AI inferencing, and brings purpose to wearing this class of device every day.
As these devices become more common and capable, data privacy becomes even more important. Users will need reassurance that the devices they choose to wear will only have access to observe their surroundings when they choose to allow it, while retaining the capability to use the powerful features that make them worthwhile.
Accessibility:
Using Visual Intelligence is an extremely powerful accessibility tool (for example; for individuals who have low vision), and can meaningfully improve quality of life. Various applications beyond Siri AI can be designed by developers with very specific inferencing capabilities powered by AI. The future of Visually Intelligent apps should have intentional, unique purposes that users can choose to incorporate in their lives. This will not be a one-size-fits-all Visual Intelligence approach, and will require specific design, training and development to create meaningful capabilities.
If this is already possible, amazing! Any resources to learn more would be greatly appreciated. If this is not yet possible, please let us know what we can do to encourage Apple to consider it.
Thank you.