Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The company developing AI mathematics did not say it received funding from OpenAI until recently, according to reports from some insiders in the AI community.
Epoch AI, a non-profit organization supported by Open Philanthropy, a research and grant-making foundation, revealed on December 20 that OpenAI supported the creation of FrontierMath. FrontierMath, a skill-based test designed to measure AI’s mathematical abilities, was one of the benchmarks OpenAI used to showcase its upcoming AI profile, o3.
In a post on the LessWrong forum, an Epoch AI contractor who goes by the name “Meemi” says that many of the contributors to the FrontierMath benchmark were not informed of OpenAI’s actions until it became known.
“Communication on this issue is not evident,” Meemi wrote. “My opinion of Epoch AI should be to disclose the costs of OpenAI, and contractors should have clear information about the potential of their work that they can use in the end, when choosing to work on the exhibition.”
On social media, others users expressed concern that this secrecy could damage FrontierMath’s reputation as a brand. In addition to supporting FrontierMath, OpenAI had access to many problems and solutions in the benchmark – a fact that Epoch AI did not reveal before December 20, when o3 was announced.
In response to Meemi’s post, Tamay Besiroglu, Epoch AI’s co-founder and co-founder, said FrontierMath’s integrity had not been compromised, but acknowledged that Epoch AI “made a mistake” in its absence. visible.
“We were prevented from disclosing the deal until the time o3 was launched, and in retrospect we had to negotiate very hard to be transparent to the benchmark providers soon,” Besiroglu wrote. “Our mathematicians had to know who could get their job. Although we were limited in what we could say, we would have made transparency to our contributors an integral part of our partnership with OpenAI. “
Besiroglu added that while OpenAI owns FrontierMath, it has a “verbal agreement” with Epoch AI not to use FrontierMath’s problem to train its AI. (Teaching AI at FrontierMath would be similar teaching to the test.) Epoch AI also has a “residual buffer” that acts as an additional safeguard for the independent verification of FrontierMath’s benchmark results, Besiroglu said.
“OpenAI has been … fully supportive of our idea of having a separate, invisible environment,” Besiroglu wrote.
However, in breaking the water, Epoch AI leads the math Ellot Glazer he wrote in a post on Reddit that Epoch AI could not verify on OpenAI’s FrontierMath o3 results.
“My opinion is that (OpenAI’s) score is valid (ie, they didn’t train on the dataset), and that they have no incentive to lie about internal processes,” Glazer said. “However, we cannot confirm them until our own review is complete.”
Saga is on someone example about the challenges of creating objective benchmarks to evaluate AI – and to find the necessary tools for development without making assumptions about conflicts of interest.