OpenAI's Digital Assistant Device
As it turns out, building hardware is hard. Even when you're a $500B startup. Even when you have Jony Ive and team on board. Here's Tim Bradshaw, Cristina Criddle, Michael Acton, & Ryan McMorrow reporting:
OpenAI and star designer Jony Ive are grappling with a series of technical issues with their secretive new artificial intelligence device, as they push to launch a blockbuster tech product next year.
The San Francisco-based start-up run by Sam Altman acquired the former Apple design chief’s company io for $6.5bn in May, but the pair have shared few details on the projects they are building.
Their aim is to create a palm-sized device without a screen that can take audio and visual cues from the physical environment and respond to users’ requests.
People familiar with their plans said OpenAI and Ive had yet to solve critical problems that could delay the device’s release.
A "palm-sized device without a screen that can take audio and visual cues from the physical environment and respond to users’ requests" sounds pretty much exactly like what I envisioned in May when everyone was guessing what hardware the newly-announced partnership might yield. And while that may all sound obvious enough, a lot of people were assuming it would be some sort of wearable to meet Meta and others in the market. But that didn't sound like what Altman and Ive were building, simply by what they were saying, and how they were saying it:
Now, all of that *could* imply a wearable of some sort. But the key thing Altman seems focused on there is input, not necessarily the need to use it on-the-go. To me, this implies voice is the key of this device. Perhaps I'm biasing myself as I've been writing about this notion for years and years at this point. While the initial wave of devices leveraging Siri and Alexa got us closer to this type of computing, the truth is that the underlying tech wasn't good enough. Not nearly. We got sort of tricked because the voice recognition had finally gotten to the point where it worked well most of the time. But that actually wasn't the hard part, as it turns out. The hard part was the AI. OpenAI built the hard part first.
Importantly, that included the voice capabilities that rolled out alongside GPT-4o a year prior to the io deal, in May 2024. I wrote at the time how after the false-starts of true vocal computing with Siri and Alexa, ChatGPT had finally unveiled technology that was the prerequisite to making such devices sing, as it were. But I was also skeptical that you could just plug this new tech into existing devices, which is exactly why Amazon, Apple, and even Google have struggled to update their systems in this regard. It felt like there was an opening for a new type of device purpose-built from the ground-up for this new technology.
Back to the problem with wearables, as I wrote in May:
The problem with a full-on wearable in this regard is that everyone focuses far too much on the whole wearable part. That is, the exterior of the device and how it will work on your body. And then: how can I get the technology to work on that? But I suspect that OpenAI/IO are focused on the opposite: what's the best device to use this technology? Why does it have to be wearable?
To be clear, I suspect that whatever the device is, it will look fantastic – this is an Ive/LoveFrom production, after all – but that's mainly because beautiful products bring a sense of delight to users and can spur usage. I suspect the key to the design here will be yes: how it works. And again, I suspect that will be largely based around voice, and perhaps augmented by a camera.
Again, this sounds like pretty much exactly what the FT's sources are saying today:
Multiple people familiar with the plans said OpenAI and Ive were working on a device roughly the size of a smartphone that users would communicate with through a camera, microphone and speaker. One person suggested it might have multiple cameras.
The gadget is designed to sit on a desk or table but can also be carried around by the user. The Wall Street Journal previously reported some of the specifications around the device.
One person said the device would be “always on” rather than triggered by a word or prompt. The device’s sensors would gather data throughout the day that would help to build its virtual assistant’s “memory”.
The notion of "always on" immediately triggers battery life concerns for me, but if it doesn't have a screen... well, that's always the element that draws the most power in a device. And so perhaps it shouldn't be a surprise that battery life isn't one of the main issues the team is grappling with, per this report. Those are:
These include deciding on the assistant’s “personality”, privacy issues and budgeting for the computing power needed to run OpenAI’s models on a mass consumer device.
“Compute is another huge factor for the delay,” said one person close to Ive. “Amazon has the compute for an Alexa, so does Google [for its Home device], but OpenAI is struggling to get enough compute for ChatGPT, let alone an AI device — they need to fix that first.”
And that also leads back to another big question I had around data:
Of course, none of that answers how such a device will connect to the cloud which will clearly be required here. A WiFi device seemingly doesn't make sense since it would be a huge pain to set up over and over again if you did take it with you (perhaps especially without a screen!). Perhaps it tethers to your phone, but there will be trade-offs there. Or maybe it comes with its own connection – the old, original Amazon Kindle approach. With the right partners, this could work, though again, whose paying for that bandwidth?
If the team (rationally) assumes that everyone would still have their phone with them alongside this new device – and certainly not including a screen on this device will necessitate that – the easiest thing would be to tether to the phone, obviously. But there are also issues there when you don't fully control the phone, just ask Mark Zuckerberg for this thoughts on that topic... Back to the FT:
The goal is to improve the “smart speakers” of the past decade, such as Amazon’s Echo speaker and its Alexa digital assistant, which are generally used for a limited set of functions such as listening to music and setting kitchen timers.
OpenAI and Ive are seeking to build a more powerful and useful machine. But two people familiar with the project said that settling on the device’s “voice” and its mannerisms were a challenge.
One issue is ensuring the device only chimes in when useful, preventing it from talking too much or not knowing when to finish the conversation — an ongoing issue with ChatGPT.
“The concept is that you should have a friend who’s a computer who isn’t your weird AI girlfriend...like [Apple’s digital voice assistant] Siri but better,” said one person who was briefed on the plans. OpenAI was looking for “ways for it to be accessible but not intrusive”.
While "Siri but better" seems like a sort of silly framing given that Apple hasn't been able to execute on this promise for the past 15 years, there's clearly a reason why they keep trying! The original notion of Siri, a digital personal assistant, is still a holy grail in tech. And while many others are trying it in different ways now with different hardware matched with AI, OpenAI has control of the most important element here: the AI! It's not as simple as "the hardware will sort itself out", of course. But given the team OpenAI has working on this...
Back to what I wrote in May:
You'll recall that Ive's last real dive into new product design at Apple was the Apple Watch. Their first true wearable, where they, yes, perhaps focused too much on the wearable part to start and less on what it should actually do.
Anyway, the IO device might be a bit like a newfangled tape recorder of sorts. Okay, I'm dating myself – a voice recorder. You know, the thing some journalists use to record subjects for interviews. Well, when they're not using their phones for that purpose, as they undoubtedly are 99% of the time these days. But it sounds sort of like that only with, I suspect, some sort of camera. I doubt that's about recording as much as it's about the ability to have ChatGPT "look" at something and tell you about it. But these are just guesses.
And they're sounding like increasingly good guesses. Even if the device is going to take longer to come to market than originally envisioned. Just imagine a world in which Jony Ive introduces OpenAI's first hardware – a small, screen-less digital companion for you life – at the same time that Apple unveils their next hardware – a pair of glasses with cameras on your face and perhaps a screen in your eye. One runs on ChatGPT. The other on Siri. That would be quite the dichotomy.





