Apple Ferrets Out UI for AI

Ferret-UI may help Siri understand iOS app interfaces
Apple’s Ferret LLM could help allow Siri to understand the layout of apps in an iPhone display, potentially increasing the capabilities of Apple’s digital assistant.

This isn't the first we're hearing about "Ferret" – an AI project inside of Apple, but the timing of a new paper released through Cornell yesterday suggests this could be related to some of the announcements around the technology due at WWDC in a few weeks.

The new paper for Ferret-UI explains that, while there have been noteworthy advancements in MLLM usage, they still "fall short in their ability to comprehend and interact effectively with user interface (UI) screens." Ferret-UI is described as a new MLLM tailored for understanding mobile UI screens, complete with "referring, grounding, and reasoning capabilities."

Part of the problem that LLMs have in understanding the interface of a mobile display is how it gets used in the first place. Often in a portrait orientation, it often means icons and other details can take up a very compact part of the display, making it difficult for machines to understand.

This may all sound fairly straightforward, but it's actually pretty complicated. Pager, a GV portfolio company, was working on some of these issues around parsing mobile UI (within screenshots) in the past. And if Apple can nail this:

While we don't know whether it will be incorporated into systems like Siri, Ferret-UI offers the possibility of advanced control over a device like an iPhone. By understanding user interface elements, it offers the possibility of Siri performing actions for users in apps, by selecting graphical elements within the app on its own.

There are also useful applications for the visually impaired. Such an LLM could be more capable of explaining what is on screen in detail, and potentially carry out actions for the user without them needing to do anything else but ask for it to happen.

Imagine an AI that could literally do anything a human could do on an iPhone. Just as you tap and click to interact with the UI on iOS, the AI could potentially do this, digitally. This could make it so that developers don't have to do anything to allow their apps to be fully compatible with something like Shortcuts – or yes, Siri.