Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Microsoft’s new rStar-Math method leverages micro-models to outperform OpenAI’s o1 evaluation of math problems.


Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more


Microsoft is increasingly focused on the capabilities of small language models (SLMs) and The unveiling of rStar-Matha new way of thinking that can be applied to small models to improve their work on mathematical problems by using thinking methods – working in the same way as, and in some cases beyond, OpenAI’s o1-preview model.

I am still in the research phase – as mentioned in a paper published on the peer review site arXiv.org and was credited to the eight authors of Microsoft, Peking University and Tsinghua University in China – this method was applied to several small open source models including Microsoft’s Phi-3 mini, Alibaba’s Qwen-1.5B (a 1.5-billion-parameter model ), and Qwen-7B (7 billion-parameter model). It showed performance on all of them, even surpassing the previous version of OpenAI MATH (word problem solving) third-party benchmark of 12,500 questions covering various branches such as geometry and algebra, and all levels of difficulty.

Finally, according to a post on Hugging Faceresearchers plan to post their code and data on Github at https://github.com/microsoft/rStareven one of the paper’s authors, Li Lyna Zhang, wrote in a comment on the Hugging Face post that the group is “still reviewing internally for public release.” Therefore, “the repository remains private for now. Please stay tuned!”

The community expressed interest, calling the innovation “exciting” and praising the combination of Monte Carlo Tree Search (MCTS) and slow reasoning. One commenter pointed to the simplicity and usefulness of using Q-values ​​for scoring steps, while others focused on future trends in geometric evidence and symbolic reasoning.

This issue follows closely on the heels of open-sourcing Microsoft’s Phi-4 model, The 14 billion-parameter AI system is now available on Hugging Face under the MIT license.

Although the release of Phi-4 has increased the possibility of finding highly effective small models, rStar-Math presents a unique approach: the use of small AI systems to achieve modern results in mathematical concepts.

rStar-Math works by using multiple models and multiple layers to enable a small ‘self-adaptive’ model.

The secret of rStar-Math is that it uses Monte Carlo Tree Search (MCTS), a technique that mimics human “critical thinking” by iteratively re-engineering the solutions to mathematical problems.

The researchers used MCTS because it “breaks down mathematical problems into simple one-generation tasks, reducing the complexity” of small models.

However, they did not just use the MCTS as other researchers have done. Instead, in a clever way, he asks the model he has trained to always produce his “mind-body” methods like natural language descriptions. and Python code.

They mandated that the model include natural language feedback as Python code comments, and only results using Python could be used to train the model.

The researchers also trained a “process model” to develop mathematical reasoning methods and a process preference method (PPM) to select reliable solutions to problems, and controlled four rounds of “self-modification,” with each model. to control others.

For their initial work, the researchers said they used “747,000 math problems from publicly available resources,” and their solutions, but developed new ways to solve them with the two models described above.

Uncountable results

After four rounds of auto-optimization, rStar-Math did the big thing:

• At A picture of MATHthe accuracy of the Qwen2.5-Math-7B model jumped from 58.8% to 90.0%, beating OpenAI o1-preview.

• At American Invitational Mathematics Examination (AIME)it solved 53.3% of problems, placing it in the top 20% of high school competitors.

These results demonstrate the power of SLMs in handling complex mathematical concepts, which are often dominated by large systems.

Smaller is better?

In recent years, AI technology has been largely driven by expanding the range of languages, adding parameters that are seen as a way to improve performance. However, the high costs associated with these large models, from computational resources to power consumption, have raised questions about scalability.

Microsoft offers an alternative, efficiency-focused solution. The release of rStar-Math reaffirms this commitment by demonstrating how SLMs can compete with – and in some cases surpass – the capabilities of their larger counterparts.

Microsoft’s dual release of Phi-4 and the rStar-Math paper shows that integrated, unique models can provide powerful alternatives to even the largest companies.

Furthermore, by outperforming their competitors on benchmarks, these brands challenge the idea that bigger is always better. They open the doors to central institutions and academic researchers to access cutting-edge technology without the financial or environmental constraints of large-scale brands.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *