M.G. Siegler •

Jensen in the Arena

The 'Token King' enters the Age of Inference...
Jensen in the Arena

I honestly didn't mean to watch the entire NVIDIA GTC Keynote. I have another post I'm working on where a small sliver is relevant, but before I knew it, there I was sitting through an entire two and a half hour presentation. Thank god for 1.5x speed. But honestly, I didn't stop because I didn't want to stop. It was compelling! Because Jensen Huang is so compelling as a showman. Others are good at such presentations – notably Snap's Evan Spiegel – but no one is in command like this since you-know-who. And arguably Jensen's command is more impressive, because it's far more technical with products far less tangible. And despite such technicalities, he's able to weave a narrative that is actually captivating. Perhaps not quite to the mainstream, but certainly to the masses. The attendance and interest in this event has grown in such a way that he can't help but poke fun at it:

"I just want to remind you, this is a tech conference."

The first hour was more or less a history lesson as to how NVIDIA, and the AI industry itself, got here. And a love letter to CUDA, NVIDIA's software layer celebrating 20 years of being an all-important moat for the company. GeForce to pixel shaders to RTX. DLSS 5 – the horribly generic name for a new AI upscaling technology for video games – looks amazing and naturally is already quite controversial. Why? AI, of course.

Speaking of, we finally got into the meat of the presentation from here. A massive slide behind Jensen highlighted a simple message: three key moments over the past two years. ChatGPT in 2023 ushered in the generative AI era. OpenAI's o1 model (and really, o3 model) brought us into the reasoning era in 2024. And last year, Claude Code gave us the agentic era. With that, AI is now able to do productive work, and that means a new era for NVIDIA and AI in general: "The inflection point of inference has arrived."

To hear Jensen tell it, they've been planning on this all along. And certainly there's some truth there in that NVIDIA has been talking up inference for a few years now. But that's mainly been in a defensive manner to insist to everyone that it wasn't a potential vulnerability for this company, but was also the strength.

It felt like you could tell how much this criticism annoyed Jensen in the part of his presentation where he pretended to hold up a wrestling-style championship belt (virtually made for him by SemiAnalysis) on stage. NVIDIA has body slammed the competition! From the top rope! It wasn't even a close call for the "Token King", you see.

Except, NVIDIA's actual actions the past year suggest that it might have been getting a bit too close for comfort as agents and coding came fully into view. Or at the very least, an area of concern.

It was just a few short months ago that NVIDIA made their move to "hackquire" Groq, the AI chip startup focused on, what else? Inference. Why did a company with no concerns about their position in the inference market feel the need to make a $20B deal/no-deal – on Christmas Eve, no less?

Jensen actually addressed this on stage while showing off their (insanely fast) integration of Groq technology within NVIDIA's new stack. Basically, he acknowledged that the use of SRAM in Groq's purpose-built LPUs (Language Processing Units) gave a level of inference speed which NVIDIA couldn't hope to match with their HBM-based (high-bandwidth memory) systems.

But he was also quick to note that Groq couldn't go it alone with only that approach, that they needed the parameter sizes and context that NVIDIA's systems allowed for (because the Groq systems operate with much smaller amounts of their memory). In other words, the systems needed to be paired together, and NVIDIA figured out a way to do that with Dynamo (their "operating system for AI factories"). They can route the inference workflows depending on the situation and need.

Two more interesting elements of the deal: 1) doing this as a "hackquire" clearly allowed them to do it fast, otherwise there's no chance they'd be ready to have Groq technology integrated so quickly. 2) Groq's chips are actually made by Samsung, giving NVIDIA some diversification away from TSMC (and HBM)...

All of this is in line with the various reports about why NVIDIA did the deal – and why Groq felt the need to do the deal. But it also points to Jensen's narratives which seem to shift with the benefit of hindsight. In a way, this has seemingly been the story of NVIDIA from the get-go. He built the company to be perfectly oriented to ride waves that he couldn't possibly foresee coming, only to later note that it was all so obvious. To me, the key to NVIDIA is malleability, not necessarily seeing the exact future. And this Groq situation is no different.

And that's no less impressive, by the way!

Hopefully that helps Jensen calm down a bit when it comes to the competition from AMD and more recently Google with their TPUs. Then again, that fire even when they're crushing – sorry, body slamming – the competition is undoubtedly part of what makes Jensen, Jensen. And what makes NVIDIA the most valuable company on Earth.

Speaking of, did you hear that Jensen now believes they're going to do $1T in sales (well, purchase orders) through next year? Of course, you did, because he made that a focal point of the presentation and that, in turn, made the headlines easy. "DO YOU HEAR THAT WALL STREET? ONE TRILLION." Love, Jensen.

Forget him holding up the wrestling belt, right now I'm envisioning Jensen as Maximus in Gladiator – two literal men in two literal arenas – "ARE YOU NOT ENTERTAINED?" We are Jensen! We are! We bow to thee.

Anyway, Jensen believes these past two years has seen computing demand increase one million times over. That's a pretty precise number for something so imprecise, but he has his own math to back it up: the amount of token generation has increased by ("roughly") 10,000x and the amount of usage has increased by ("probably") 100x. There you go, 1,000,000x.

He then reiterated the most important equation in all of AI at the moment: how they're getting the math to work for the data center build-outs. Basically: if companies get more AI generation capacity, they can generate more tokens, that means revenues are going up, and also more people are using it, which in turn makes the AI smarter. The "positive flywheel system" as Jensen calls it.

Again, basically all of AI is built around this notion at the moment. OpenAI most overtly, but other less forward-facing companies such as private credit players doing the debt financings for many of these data center deals need this system to keep going. No one wants to see what will happen if a "negative flywheel system" starts. Including NVIDIA.

Still, while there are a number of macro risks lingering out there that could puncture The AI Bubble, NVIDIA shows no sign of slowing. In fact, their revenue growth has been increasing again in recent quarters, which is just astonishing at their current size. They also just posted the most profitable quarter ever. Well, for any company aside from one year when an oil crisis fueled Saudi Aramco to new heights. An oil crisis you say...

Never mind all that, at least for now. Jensen's message with all the above was loud and clear: all NIVIDA partners can make their infrastructure commitments with complete confidence. Please and thank you.

To that end, a large portion of the keynote was devoted to a Vera Rubin show-and-tell. Not only are the systems on track to ship in the second half of 2026 (yes, including the new Groq component) – "probably in the Q3 timeframe" – Microsoft has already installed one such rack in Azure. And what used to take two days to install, now takes two hours.

As for the rest of the roadmap, 'Rubin Ultra' – the 'tock' to the 'Rubin' 'tick' – is already taping out according to Jensen and set for 2027. Then comes 'Feynman' in 2028 (after American physicist Richard Feynman). It will be paired with the 'Rosa' (after American physicist Rosalyn Sussman Yalow) CPU and an all-new LPU made with Groq technology, the 'LP40' (no fun scientific naming scheme here, it seems).

From here, Jensen hit on the whole data-centers-in-space thing. But the most interesting aspect may have been how quickly he moved through it – even noting as such. To hear him tell it, there's still a lot of work to do here, most notably with the conduction/convection issues in the vacuum of space. He says they have great engineers working on how to do it but it's "very complicated". One suspects it will be pitched as decidedly less complicated during the SpaceX IPO roadshow...

Far more time was devoted to OpenClaw. Calling it the most successful open source project in history, and just as big of a deal as HTML and Linux, Jensen framed it as the operating system for the "agentic computer". But then he got to the catch: security. Which, naturally, is what NVIDIA viewed as their opportunity here – in particular within enterprise. And while 'NemoClaw' is dropping the 'Open' from the name, Jensen says it will still be open, and the fact that they worked with creator Peter Steinberger, who himself of course now works at OpenAI – would seem to give that credence. Still, we're clearly about to see every large company come in with their own solution here – Perplexity already has, and Meta just did with Manus (as predicted).

Will "open" win here?

As an aside, this section also brought one of my favorite moments of the keynote as Jensen really had to stretch to turn the 'SaaS' acronym into 'AgaaS'. It took me a minute to figure out what he was saying with "A Gas" – but it's 'Agent-as-a-Service', which of course would more naturally brand itself as 'AaaS'. Butt, well...

We stayed on the topic of "open" as Jensen talked up NVIDIA's 'Nemotron' open models, with version 4 in the works, partnering with Black Forest Labs, Cursor, Mistral, Perplexity, Thinking Machines, and others. Honestly, the most interesting bit of this may have been the notion of future companies using token access as a recruiting tool. (Similar to part of Meta's recent MSL pitch for compute access.)

Jensen closed out the keynote with robots. Noting that NVIDIA had been working on self-driving technology for a long time – well before the current AI movement, of course – and now they seem to be right-place/right-time again, with much of the industry lining up with them in time for their own "ChatGPT moment".

Oh yes, and Olaf! A clear highlight for either anyone under 10, or with kids under 10, to see a walking/talking character from Frozen. For as good as Jensen is on stage, Olaf clearly needs some more reps as the interaction was a bit awkward. Still, of all the things to critique, a working, interactive robot on stage... I'll let it go.

👇
Previously, on Spyglass...
Maybe You Can’t Just Throw Money at AI
How and why Meta AI and xAI have failed in AI thus far…
Firefly Flies Again
Serenity now... animated.
Warner Bros Wins the Oscars
And by proxy, Paramount…