A Matter of Trust
While this interview around the roll-out of Microsoft's new consumer-facing Copilot product starts a bit too in the clouds – "There is this new sort of design material that is actually about persistence, about relationship, about emotion." – as they drill down into specifics, it gets more interesting. But to start at that high level, here's Mustafa Suleyman on the topic of what he brought over (not physically – but also maybe!) from his last startup, Inflection:
What I've long believed in, even since before DeepMind days, is AI’s potential to provide support. Emotional support is actually one of the things I first worked on as a 19-year-old, when I started a telephone counseling service.
That's the beauty of this technological moment. To see what it feels like to engage with one of these experiences over a sustained period of time—this companion that really gets to know you. It’s coaching you, encouraging you, supporting you, teaching you. I think that isn't going to feel like a computer anymore.
Sort of funny/interesting that he pivots the question immediately to DeepMind instead, almost as if he doesn't want to talk about the other startup, the acquisition "hackqusition" of which remains controversial (though regulatory bodies seem to be clearing it simply because Inflection was a non-player), and which had a product, Pi, which seemed awfully similar to this one...
But to his actual talking point, are we really ready for that level of AI – a trusted counselor – given how fast-moving and chaotic this all still is? Hallucinations aside, what kind of advice is this technology going to give to people?
On the "vision mode":
There's just so many little moments when you're sitting at your computer. It's phenomenal to have this AI companion see whatever you see, and talk to you in real time about what you're looking at. It sort of changes the route that you take through your digital life, because you don't have the burden of having to type something in.
That does seem interesting from a product perspective. It's a compelling concept. Of course, it also sounded interesting when it was called 'Recall' under a different part of Microsoft's AI initiatives. And that, of course, didn't work out so well to start (they're working on re-rolling it out now). It requires a lot of trust in Microsoft which... has been problematic to date.
This sounds like Recall, the controversial and now opt-in Windows feature that records a users’ on screen.
We don't save any of the material with Copilot Vision, so once you close the browser after your session, it all just disappears. It fully deletes. But I'm thinking about if and how to introduce it in the future, because a lot of people do want that experience. If you could just say, ‘What was that picture that I saw online the other day? What was that meme?’ I think we'll have to look into it one day.
At the moment, though, the Copilot Vision tool is ephemeral. We'll sort of have to experiment over time and see what makes sense on that front.
Obviously it's a safer approach to have it ephemeral. But also obviously in the ultimate state it doesn't stay that way. And if/when that's the case, how does it interact with Recall? Seems awkward at best, complicated at worst...
You’re also introducing Think Deeper, which will let Copilot tackle more difficult problems. This is based on OpenAI’s o1 model, aka Strawberry, right?
It's like Strawberry, yeah. There's an OpenAI model that we've tuned up for our more consumer purposes, and we've got it to act in a way that is more consistent with our AI companion theme.
What are the differences?
OpenAI’s is much more focused on pure math and scientific problem-solving. And what we've tried to do is have it focus on side-by-side comparisons and sort of consumer analysis, stuff like that.
Or when you get stuck on a hard problem or you want to reason through something, then it can really lay out a side-by-side comparison, or do an analysis at scale.
No surprise there given how they described the feature – but interesting that they're "tuning" it in different ways to be more consumer-focused... Wonder what OpenAI thinks about that augmentation...
People are going to remember Clippy, Microsoft’s last AI helper for Windows. Do people there see parallels?
Ha, well I saw Bill Gates the other day, and he was like, you do realize you've misnamed this whole AI thing? It should be called Clippy. I was like, dude!
But I mean, it just shows you how mind-blowing people like Bill are. People who see, not just, you know, two years ahead, but 20 years.
Har, har. But also, this new Copilot actually seems less like Clippy than the other, more-enterprise focused Copilot(s)? That said, it may have been fun to bust out the name again, but it just has too much baggage... I say let Steve Ballmer use it as a mascot down in Los Angeles instead.
Wait, you have an AI agent for Windows that can go off and buy things for you?
It's a way off, but yes, we've closed the loop, we've done transactions. The problem with this technology is that you can get it working 50, 60 percent of the time, but getting it to 90 percent reliability is a lot of effort. I've seen some stunning demos where it can independently go off and make a purchase, so on. But I've also seen some car crash moments where it doesn't know what it's doing.
Tell me more about a car crash. Did it go off and buy a Lamborghini on Bill’s credit card?
If it used Bill's credit card, then I think it would be quite funny. But no, like I said, we're still figuring it out step by step. It's still deep in the doldrums of the labs. There's a long way to go with these things but you can count it in quarters, not years, I would say.
That's Suleyman saying that pretty robust "agentic" AI is perhaps just over a year away. It feels like the next big question in AI – well before the AGI/Superintelligence one! – is if getting from that 50 percent to 90 percent is a straight path or if it gets exponentially harder as you approach the upper bounds. And as such, indeed takes years to fully figure out.
If agents are rolled out and they simply don't work a decent percentage of the time – or require more work on your end to get them working, the whole thing is not going to work. You really have to to "trust" these tools, as Suleyman puts it.
But he's implying a dual meaning in this interview: trusting a technology enough to tell it personal things about yourself and trusting it enough to get a task done. These are two very different things!