OpenAI's 'Operator' Shows Why They'll Build a Web Browser
To me, the most interesting aspect of OpenAI's new 'Operator' product – their first real agent – is not what it can do, but how it does it. Unlike the early iterations of similar products from Anthropic and Google, 'Operator' doesn't take over your computer, it outsources the work you want to do to OpenAI's computer in the cloud. And like so many of us these days, that computer really is just using one app: the web browser.
It's both clever and at the same time obvious. Why would you want to do this on your own machine, thus tying it up while you're trying to do something else – isn't that the point of an agent, freeing you up to do other things?1 But it's also a bit weird: watching a bot perform a task you've asked of it on some remote computer from afar.2 It's all a bit Kabuki theater. And that quickly made me realize another obvious element of all this: OpenAI needs to make their own web browser.
I've made this case before, in the context of the Department of Justice trying to make Google spin-off Chrome. In a post entitled, Forget the Fate of Chrome, Focus on the Fate of the Browser, I wrote the following (with the notion that what's most likely to supplant Google Search is some iteration of an AI product):
Something in this current jumble of chaos feels like it will be what eventually displaces Google Search. But none of those have anything to do with Chrome other than they mainly run inside of it, just as other web apps do. A move towards "agents" could change this dynamic, with some early entrants likely focused on using the browser as the hub from which such AI operates – that likely includes OpenAI...
Well, just a few months later, here we are. And interestingly enough, Operator runs on, what else? Chrome. It seems to be a customized version, which undoubtedly locks down the software more so than a regular web browser, but it's Chrome running an 'OpenAI Operator' extension (as my "taking over" and clicking around made clear). That makes sense to start; Chrome is the most popular browser in the world (which is why the DoJ wants to rip it away from Google). OpenAI had to train Operator on data for how others use the web, and most everyone does this on Chrome.3 And just as importantly, all sites and services work on Chrome, the same of which cannot be said of all browsers. So Chrome it is. For now.
Back to the Kabuki theater point, it's a little silly to have a machine browse the web for you. Again, it makes sense in these early days – and it's quite useful to see what it's doing from a trust perspective (and to be able to tell what it did wrong) – but it's hardly the most efficient way to perform a task. The cursor is a visual indicator of using a mouse, but why would a bot have to use a mouse? Ditto with a keyboard! A bot has no arms! We're forcing human constructs onto the AI.
To beat the dead bot: that undoubtedly helps ease us into the future, and has some tangible benefits for now, but it's all a bit silly.
When I say that OpenAI should build their own browser, I don't mean for you and me, specifically. I mean for 'Operator' to use!
That is to say, if and when they do this, it will need to be designed with the notion that the main user (or at least a main user) will be AI, not humans. If they can create a product that is designed for both, all the better. But perhaps it's a browser which switches between two modes depending on if you, the human, needs to take over and "drive".
What I'm describing is actually not far from what APIs do. In a way, they're sort of the computer version of doing something without the human-focused UI. And, of course, OpenAI has already said there will be Operator APIs as well. That's good and it will allow Operator to break out of the browser, but I still think that OpenAI building a web browser (undoubtedly using some of these APIs) is what they need to do to make an efficient human/machine system in these early agentic days.
If nothing else, they probably can't rely on Chrome because if and when Google fully bakes Gemini into their browser, that's going to cause all sorts of issues for OpenAI, obviously. I think it's fine to build it on Chromium – assuming that project isn't imperiled by the DoJ call to spin-off Chrome – but it should be a browser that OpenAI can fully control and build for their own purposes.
Back to my post from November:
I do think it makes sense for OpenAI to have a browser. They shouldn't buy Chrome, nor should they be allowed to for all the reasons discussed. But they should do what Google did back in the day and build it from the ground up to tailor it for their own vision of the future of the internet – complete with a new "Omnibox" that is truly "omni". And so the reports (and hires) suggesting that they're working on such a browser makes sense, of course.
Now, you might argue that their resources are better spent working on what's next after a web browser. And yes, they should be doing that too – and seemingly are. But an OpenAI web browser product right now reminds me not of Chrome, but of Google's old toolbar for Internet Explorer back in the day. One of the early projects which, much like Chrome itself, was led by none other than... current Google CEO Sundar Pichai. This was their way to get a toehold in the market at the time. And it worked.
You might say that the ChatGPT Chrome extension – pretty aggressively installed and implemented alongside the new Search product – is OpenAI's version of this. And it may be. But I think a re-imagined browser tailored around ChatGPT would be better placed to gain a toehold within our current tech environment. Until we get to the AI wearable product or whatever is actually next.
Imagine a new web browser where interaction with various AI agents is the primary input – including ones for browsing the web, of course. It still has a lot of the stuff you know and need, but adds other elements specific to AI, and removes a lot of cruft that is no longer needed. It's a browser you can use, but it's built just as much with your 'Operator' in mind. And it doesn't even necessarily need a screen to operate. That's how you break Chrome's dominance. And perhaps Google's.
And even now, during day one of Operator, if you squint, you can see it.
1 I mean, the actualy answer here is privacy, of course. Just ask Microsoft about this one. But this is day one of these services, and while OpenAI is seemingly saying all the right things to alleviate privacy concerns (they're not saving the data on the browsing they're doing for you, what you enter, etc), there still must be some level of trust, or at least, understanding is going to be required if you want to live on the cutting edge here.
2 It was especially weird when I asked it to browse this site and summarize the most recent articles. And then when I stepped in to take over, which unfortunately seemed to trip-up my Operator, which then proceeded to try to re-size my site in its browser window for the next 10 minutes. Does my 'Operator' need glasses? Early days!
3 Worth noting that Apple's "Ferett" project seemingly seeks to enable similar AI controls for apps by parsing app UI.