Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Common tests for Artificial General Intelligence (AGI) it is close to being abolished. But the makers of the test say this reflects flaws in the test’s design, rather than actual research.
In 2019, Francois Cholleta world leader in AI, launched the ARC-AGI benchmark, short for “Abstract and Reasoning Corpus for Artificial General Intelligence.” It is designed to see if an AI system can acquire new skills outside of what it was taught, ARC-AGIFrancois says, it is still the only AI test to measure the progress of general intelligence (although others provided.)
As of this year, high-performance AI can solve one-third of ARC-AGI tasks. Chollet criticized the industry’s focus on language majors (LLMs), which he believed could not “think”.
“LLMs struggle with stability, because of their reliance on the heart,” he said he said list of articles on X in February. “They crush anything that wasn’t in their curriculum.”
To Chollet’s point, LLMs are computing machines. Trained on many examples, they learn patterns in those examples to make predictions, such as the “for whom” in email often leads to “it may be difficult.”
Chollet notes that while LLMs may be able to memorize “discussion techniques,” it is unlikely that they will be able to develop “new ideas” based on new situations. “If you want to be trained on many examples, even if they are logical, to learn to represent them again, you are memorizing,” Chollet. they argued in another post.
In order to encourage further research for LLMs, in June, Chollet and Zapier co-founder Mike Knoop committed $1 million. competition building an open source AI capable of beating ARC-AGI. Out of 17,789 submissions, the winners scored 55.5% – ~20% more than the 2023 winners, although they are still short of the 85%, “personal level” required to win.
That doesn’t mean we’re ~20% closer to AGI, though, Knoop says.
Today we are announcing the winners of the ARC Prize 2024. We are also publishing a technical report on what we learned from the competition (next tweet link).
The highest rate went from 33% to 55.5%, the biggest one-year increase we’ve seen since 2020.
– François Chollet (@fchollet) December 6, 2024
In a blog postKnoop said that most of the data sent to ARC-AGI has been able to “force” their approach to finding a solution, meaning that “a small portion” of ARC-AGI’s work “(shouldn’t be) carrying the most useful signals. intelligence.”
ARC-AGI has a similar problem where the AI has to, given squares of different sizes, generate the correct grid “answer”. These challenges are designed to force the AI to adapt to new challenges it has never seen before. But it is not clear that they are achieving this.
“(ARC-AGI) has not been updated since 2019 and is not good,” Knoop admitted in his post.
Francois and Knoop met again opposition looking at ARC-AGI as a benchmark for AGI – at a time when the definition of AGI is being heavily contested. One recent OpenAI employee he said that AGI is “already achieved” if one defines AGI as AI “better than most humans at most tasks.”
Knoop and Chollet say they plan to release a second ARC-AGI benchmark to address these issues, along with the 2025 competition. “We will continue to direct the efforts of the research community on what we see as the most unsolved challenges in AI, and accelerate the AGI timeline,” Chollet wrote in X. post.
Fixes don’t come easily. If the initial failures of the ARC-AGI test are any indication, defining AI intelligence will be impossible — and polarizing – as it has been for people.