M.G. Siegler • #llama • Jul 30, 2025

Meta's 'MoE' Mistake

More details on how Llama's open source AI backfired...

A month ago, I wrote a post entitled "Meta's Open Source AI Mistake" outlining how the company found itself in the position of needing to pay (or at least offer) individuals hundreds of millions of dollars to come help to reboot their AI efforts. At a high level, their strategy for Llama not only wasn't working in terms of getting Meta to the cutting edge of AI, but the "open source" (read: open weight) ideals they were trying to adhere to actually may have backfired.

This new reporting on the matter goes more granular on those issues:

Llama 4′s struggles can be traced back to January, when the sudden rise and ensuing popularity of the open-source R1 AI model by DeepSeek caught Meta off guard, leading to a reevaluation of Llama’s underlying architecture, the people said.

DeepSeek’s R1 is a so-called mixture-of-experts AI model, or MoE. R1 is similar to OpenAI’s o1 family of models that can be trained to excel at multistep tasks like solving math equations or writing code.

By contrast, Llama’s models — before their latest release - were dense AI models, which are generally simpler for most AI developers to fine-tune and incorporate into their own apps, the people said.

To add insult to this injury, DeepSeek R1 was largely distilled from Llama! Because it was open source! Essentially, DeepSeek used Meta's foundation to showcase a better way to build a better model (MoE), perhaps augmenting it with (decidedly not open source) work from OpenAI and others.

This was an "oh shit" moment for Meta internally, and so they seemingly scrambled to build Llama 4 in such a manner:

Suddenly, Meta executives thought they had a clearer picture into how to create their own efficient and possibly cheaper MoE models, potentially leapfrogging rivals like OpenAI, people familiar with the matter said.

Still, some staff members in Meta’s GenAI unit pushed for Llama 4 to remain a dense AI model, which though generally less efficient, is still powerful, and Meta originally planned on that architecture acting as the backbone supporting improved voice recognition capabilities, the people said.

Ultimately, Meta went with the MoE approach, due in part to DeepSeek’s innovations and the promise of pulling ahead of OpenAI, the people said. Meta released two small versions in April and said a “Behemoth” version would come at a later date.

But the new MoE architecture disappointed some developers, who were simply hoping Llama 4 would be a souped-up version of Llama 3, people familiar with the matter said. Llama 4 also failed to deliver a significant leap over competing open-source models from China, the people said.

To put it in terms that longtime Apple watchers may appreciate: the people wanted a better Apple II, not the Lisa.

Add to this that the "dense" version of Llama was proving to be extremely expensive for Meta to maintain, especially when others were just going to mooch off the work, rather than contribute back to make Meta itself stronger. Zuckerberg passed the hat around to try to get some help on that financial burden, but got no takers because – why buy the Llama when you get the model for free?

Executives at Meta as well as the Superintelligence Labs’ high-profile hires are now questioning the company’s current open-source AI strategy, and have considered skipping the release of Behemoth in favor of developing a more powerful proprietary AI model, the people said.

While "spokespeople" continue to downplay this notion, it seems pretty clear that Meta will go down this path with the new group. I'm guessing that they'll open source some of the models eventually – just as OpenAI itself is now gearing up to do – but the main work is likely to be behind closed doors, just as it is at OpenAI, Anthropic, and elsewhere.

The real question for Meta: can their newly formed band of pirates ship the Mac?

You might also like...

The Albanian Army Closes in on Warner Bros

Drop the 'Meta'. It's Cleaner.

Live and Let Dye

OpenAI's One Battle After Another

The AI Snow Globe is Shaken