M.G. Siegler • #ai • Mar 28, 2024

Hey Meta, Look At This

Lead image aside, this isn't a production shot for True Detective: Season 5. But it's almost a shot from Her 2:

In a sign that the tech industry keeps getting weirder, Meta soon plans to release a big update that transforms the Ray-Ban Meta, its camera glasses that shoot videos, into a gadget seen only in sci-fi movies.

Next month, the glasses will be able to use new artificial intelligence software to see the real world and describe what you’re looking at, similar to the A.I. assistant in the movie “Her.”

The glasses, which come in various frames starting at $300 and lenses starting at $17, have mostly been used for shooting photos and videos and listening to music. But with the new A.I. software, they can be used to scan famous landmarks, translate languages and identify animal breeds and exotic fruits, among other tasks.

To use the A.I. software, wearers just say, “Hey, Meta,” followed by a prompt, such as “Look and tell me what kind of dog this is.” The A.I. then responds in a computer-generated voice that plays through the glasses’ tiny speakers.

Brian X. Chen and Mike Isaac got to take the software for an early spin and while their takeaway is pretty forgiving, it reads as awfully rough to use as a practical product. The software makes mistakes at the zoo, at the grocery store, at the museum. Small, silly things. But things that feel like awfully low-hanging fruit. And things which would actually be beyond annoying if you were really trying to use these day-to-day. The other obvious takeaway:

The strangest part of this experiment was speaking to an A.I. assistant around children and their parents. They pretended not to listen to the only solo adult at the park as I seemingly muttered to myself.

I also had a peculiar time grocery shopping. Being inside a Safeway and talking to myself was a bit embarrassing, so I tried to keep my voice low. I still got a few sideways looks.

While wearing, say, a giant face screen is more problematic in this regard, this is without question an issue. A lot of times in public it's just a better experience to silently interact with a computer while typing rather than talking to yourself out loud. Maybe such stigmas fade with time, and certainly the voice assistants have helped with this a bit, but we're still not really close to this being a "normal" behavior. And that matters because the only way you're going to do this on a regular basis is if the experience is so great that it overcomes this impediment.

And yes, with the caveats that this is pre-release software, again, it doesn't sound like we're particularly close there either...

When Meta’s A.I. worked, it was charming. I picked up a pack of strange-looking Oreos and asked it to look at the packaging and tell me if they were gluten-free. (They were not.) It answered questions like these correctly about half the time, though I can’t say it saved time compared with reading the label.

And:

But when I asked the A.I. to look at a handful of ingredients I had and come up with a recipe, it spat out rapid-fire instructions for an egg custard — not exactly helpful for following directions at my own pace.

And:

The A.I. was wrong the vast majority of the time, in part because many animals were caged off and farther away. It mistook a primate for a giraffe, a duck for a turtle and a meerkat for a giant panda, among other mix-ups. On the other hand, I was impressed when the A.I. correctly identified a specific breed of parrot known as the blue-and-gold macaw, as well as zebras.

And:

Other times were hit or miss. As I drove home from the city to my house in Oakland, I asked Meta what bridge I was on while looking out the window in front of me (both hands on the wheel, of course). The first response was the Golden Gate Bridge, which was wrong. On the second try, it figured out I was on the Bay Bridge, which made me wonder if it just needed a clearer shot of the newer portion’s tall, white suspension poles to be right.

Etc. Etc. Etc. The "Meta spokesman" is quoted so often noting that this is early days and fixes were coming and yadda yadda that they could have bylined this piece.

AI is obviously progressing very quickly and so it seems reasonable to believe that this will all improve rapidly. At the same time, I'm sort of surprised it's making such seemingly obvious errors. Images, let alone real-time moving images, are obviously a different beast than text, the current favorite input of most LLMs at the moment. So perhaps it's one of those things that needs real world scale to work well. But will such glasses achieve that? They're certainly more convenient, not to mention accessible from a cost perspective than the Vision Pro. But they're certainly not going to be more ubiquitous than a smartphone. And the iPhone seems on the verge of launching embedded AI training systems on a billion-plus devices soon.

Honestly, I'm just sort of surprised Meta let two NYT reporters do this story if the tech clearly wasn't ready for prime time. This stuff needs to be really good before it goes mainstream. And we're not there yet.

You might also like...

Open Source AI was the Path Forward

We're Seemingly Still in the "Throw Money At It" AI Era...

An OpenAI Acquisition Turns Into a Google 'Hackqusition'...

Escalated, the AI Browser Wars Have – Quickly

Apple is Ripe