"Hello, Computer."
If the vocal computing category has a boy crying wolf, I may be it. I've been writing about the notion that operating our computers through voice is "right around the corner" for almost two decades. And long before that, I was an early adopter to many a PC microphone in the 1990s and later Bluetooth earpieces in the 2000s in an attempt to run all of my computing through my voice (and ears).1 As an industry, we've made great strides over that time span. But we're still not living aboard the USS Enterprise in Star Trek, blabbing away to our machines. Not yet.
Given my track record here, I feel a bit silly writing this, but I really do believe we're at some sort of inflection point for voice and computing. Why? AI, of course.
Yes, technically "AI" is why I thought we were closing in on this future 15 years ago when I was a reporter breaking the news that Siri integration would be the marquee feature of iOS 5 and what would become the iPhone 4S (perhaps for 'Siri'). Pushed by Steve Jobs, Apple was trying to jump ahead to the next paradigm in computing interaction after leveraging multitouch to revolutionize the world with the iPhone (not to mention on the Mac with the mouse, back in the day). Again, voice technology had been around for a long time, but the only place it really worked was in science fiction. Armed with Siri, a startup they had acquired the year before, Apple thought now was the time. "Now" being 2011.
It didn't exactly work out that way. To the point where Apple is actually the boy who cried wolf when it comes to Siri. After the buzzy launch in 2011, 2012 was going to be the year they made Siri work well. Then 2013. Then 2014. Then Amazon launched Alexa and thanks to a better strategy around vocal computing at the time, started to eat Apple's lunch. Millions of Echo devices later and Google entered the space and it looked like we were off to the races...
But it was all sort of a head fake. A hands-free way to set timers and play music. Maybe a few trivia games. And not much else. Amazon couldn't figure out how to get people to shop in a real way with voice. Google couldn't figure out the right ads format. Billions were burned.
All the while, Apple kept telling us that 2015 was the year of Siri. Then 2016. Then 2017. 2018. 2019... All the way up until WWDC 2024, when this time, Apple meant it. Thanks to the latest breakthroughs in AI, Siri was finally going to get grandma home from that airport using some simple voice commands. It was coming that Fall. Then the following Spring. Then never. Is never good for you?
Fast forward to today, 2026. That functionality may now actually be coming this Spring. Something I obviously would never in a million years believe given Apple's track-record here. Except that they've seemingly outsourced the key parts – the AI – to Google.
So... we'll see!
Regardless, AI was the key missing ingredient. We just didn't realize it because we thought we had that technology covered. Sure, it was early, but it would get better. But as it turns out, what powered Siri, and Alexa, and even Google's Home devices wasn't the right flavor of AI. Depending on the task, it could taste okay. But most tasks left you throwing up... your hands in frustration. By 2017, it was clear that the world was shifting again, as I wrote in an essay entitled "The Voice":
And then there’s Siri. While Apple had the foresight to acquire Siri and make it a marquee feature of the iPhone — in 2011! — before their competitors knew what was happening, Apple has treated Siri like, well, an Apple product. That is, iterate secretly behind the scenes and focus on new, big functionality only when they deem it ready to ship, usually timed with a new version of iOS. That’s great, but I’m not sure it’s the right way forward for this new computing paradigm — things are changing far too quickly.
This is where I insert buzzwords. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning. AI. Machine Learning…
But really: AI. Machine Learning.
In hindsight, all of this was correct. But even then, we didn't realize that "Machine Learning" – the specialty which brought John Giannandrea from Google to Apple – was closer, but still needed to evolve too. Into LLMs.
As that revolution, ushered in by OpenAI with ChatGPT, building on the back of insights shockingly discarded by Google, has washed over the entire tech industry and has started to seep into the broader population, it seems like the time may be at hand for voice to work, for real this time.
This is what I saw a glimpse of with OpenAI's GPT-4o launch a couple years ago, and wrote about at the time with "OpenAI Changes the Vocal Computing Game!":
Said another way, while this is undoubtedly a series of large breakthroughs in technology, it's just as big of a breakthrough in *presentation*. And this matters because 99% of the world are not technologists. They don't care how impressive and complicated the technology powering all this stuff may be, they just care that Siri can't understand what they're actually looking for and keeps telling them that in the most robotic, cold way possible. Which is perhaps even more infuriating.
Some of this got buried under the hoopla created when Sam Altman directly reference the movie Her and got everyone up in arms about one of the voices that sounded perhaps a bit too much like that of Scarlett Johansson. But part of it was also that while we kept inching closer, we still weren't quite there yet with regard to voice and computing.
The voice modes across all of the different services really are pretty incredible now – certainly when compared to the old school Siri, Alexa, and the like – but it's still not quite enough to make the AI sing, perhaps quite literally. Part of that is the underlying models, which for voice are slightly inferior to the text-based models – something which OpenAI is actively working on addressing – but another part of it is simply a UI one. While all the services keep moving it around to spur usage, voice mode is still very secondary in most of the AI services. Because they're chatbots. The old text-based paradigm is a strength and a weakness. As I wrote:
One side of that equation: the actual "smarts" of these assistants have been getting better by leaps and bounds over the past many months. The rise of LLMs has made the corpus of data that Siri, Alexa, and the like were drawing from feel like my daughter's bookshelf compared to the entirety of the world wide web. But again, that doesn't matter without an interface to match. And ChatGPT gave us that for the first time 18 months ago. But at the end of the day, it's still just a chatbot. Something you interact with via a textbox. That's fine but it's not the end state of this.
The past 18 months have seen a lot of reports about projects trying to break outside of that textbox. While the early attempts quickly – and in some cases spectacularly – failed, undoubtedly because they were trying to be too ambitious, and do too much, a new wave is now coming to tackle the problem. This is led by none other than OpenAI itself, which acquired the hardware startup co-founded by one Jony Ive to clearly go after this space. To make an "anti-iPhone" as it were. A deceptively simple companion device powered by AI and driven by voice.
That's just a guess, of course. But it's undoubtedly a good one. And you can see all of the other startups coalescing around all of this as well. Hardware startups too! Pendants, and clips, and bracelets, and note-taking rings – not one, but two separate, similar projects – oh my. All of them clearly believe that voice is on the cusp of taking off, for real this time.
And right on cue, Alexa is back, after some fits and starts, resurrected as Alexa+ powered by LLMs. Google Home is on the verge of being reborn, powered by Gemini. Siri too! Maybe, hopefully, really for real this time!
2026 feels pretty key for all of this. The models have to be refined and perfected for voice. In some cases, perhaps even shrunken down to perform in real-time on-device. Then we need to figure out the right form-factors for said devices. Sure, the smartphone will remain key, and will probably serve as the connection for most companion tech, but we're going to get a range of purpose-built hardware for AI out in the wild which will be predominantly controlled via voice.
Smart glasses too, of course. Even Apple Watch. And AirPods should continue to morph into the tiny computers that they are in your ears. Voice is the key to fully unlocking all of this.2 And, one day, the true next wave: robots. Are you going to text C-3PO what you want him to do?3 Of course not, you're going to tell him.
1 Yes, I was that guy.
2 With a special shout-out to Meta's wrist-input device (born directly out of our old GV investment in CTRL Labs!) as a wild card here...
3 And with that, I have successfully conflated Star Trek and Star Wars, you're welcome, Gandalf.