Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
When a company released a new AI video generator, it didn’t take long for someone to use it to create a video of actor Will Smith eating spaghetti.
It’s become something of a meme and a symbol: Seeing if a new movie maker can get Smith to knock down a bowl of noodles. Smith himself parodied that’s happening in an Instagram post in February.
Google Veo 2 has done it.
Now we are finally eating spaghetti. pic.twitter.com/AZO81w8JC0
– Jerrod Lew (@jerrod_lew) December 17, 2024
Will Smith and pasta is one of several strange “unacceptable” signs. taking the AI community by storm in 2024. A 16-year-old developer created a program that gives AI control over Minecraft and tests its design skills. Elsewhere, a British developer has created a platform where AI plays games like Pictionary and Connect 4 against each other.
It’s not like there aren’t a lot of AI training tests. So why did the weirdos blow up?
For one thing, most AI benchmarks don’t tell the average person very much. Companies often tout their AI’s ability to answer questions on Math Olympiad exams, or find logical answers to Ph.D. problems. However many people – yours truly included – use chatbots for things like responding to emails and necessary surveys.
Crowdsourced corporate actions are not always better or more informative.
Take for example, Chatbot Arenaa public symbol many AI enthusiasts and developers follow closely. Chatbot Arena allows anyone on the internet to see how AI works for specific tasks, such as creating a web app or creating an image. But pollsters tend to be non-representative — many come from the AI and technology sectors — and cast votes based on preferences, which are hard to pin down.
Ethan Mollick, a professor of management at Wharton, recently said in a post on X another problem with many of the industry’s AI benchmarks: they don’t compare the performance of the system to that of a normal human.
“The fact that there are not 30 different indicators from different medical organizations, laws, ethical advice, etc. is a shame, since people are using systems for these things, regardless,” Mollick wrote.
Strange AI symbols like Connect 4, Minecraft, and Will Smith eating spaghetti are real. no empirical – or even completely generalizable. Just because an AI nails Will Smith’s test doesn’t mean it’s going to make, say, a burger joint.
One expert I spoke to about AI benchmarks pointed out that the AI community tends to focus on AI’s slowness rather than its capabilities at lower levels. That’s understandable. But I have a feeling the strange symptoms won’t go away anytime soon. Not only are they fun – who doesn’t love watching an AI build Minecraft houses? – but they are easy to understand. And like my friend Max Zeff wrote about recentlyThe industry continues to struggle with distilling complex technology such as AI into the marketing mix.
The only question in my mind is, which new benchmarks will be infected in 2025?