Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more
In a new study, Hugging Face researchers have shown how Minor languages (SLMs) can be modified for high efficiency. Their findings show that the Llama 3 model with 3B parameters can outperform the 70B model on complex math problems.
Face Hugs all written the whole process is to provide a road map for businesses who want to develop their sustainable ideas.
This work was inspired by OpenAI o1which uses incremental “thinking” to solve complex math, writing and reasoning problems.
A key idea for models like o1 is to maximize “trial time,” which means using more calculations to test and verify different solutions and reasoning methods before giving a final solution. Counting the test time is especially useful when there is not enough memory to run a large version.
Since o1 is a private model and OpenAI is still working on the internals, researchers have been speculating about how it works and trying to improve the process. There are several Open other options for o1.
Hugging the face works from a DeepMind study released in Augustwhich investigates the tradeoffs between inference-time and pre-training computation. This tutorial provides clear instructions on how to combine training and calculations to get the best results on a tight budget.
In addition to the use of additional computation time, the success of this method depends on two important components: The reward model that evaluates the SLM solutions, and the search method that implements the method that takes control of its solutions.
The simplest way to use time-testing is “multiple voting,” in which the same information is sent to the sample multiple times and the highest voter is selected. In simple problems, multiple votes can be useful, but they gain momentum in complex problems or tasks where errors are stable over generations.
The most critical method is “Best-of-N.” In this method, SLM generates multiple solutions, but instead of majority voting, a payment method is used to evaluate the solutions and select the best one. “Weighted Best-of-N,” an extension of this method, which combines to select solutions that are more reliable and more frequent than others.
The researchers used a “reward reward method” (PRM) that shows SLM solutions not only for the final solution but also for the several steps it goes through to reach it. Their experiment showed that Weighted Best-of-N and PRMs brought Flame-3.2 1B close to the level of Llama-3.2 8B on the difficult MATH-500 benchmark.
To improve the model’s performance, the researchers added search algorithms to the model’s logic. Instead of generating the answer in one pass, he used a “tree search,” an algorithm that guides the model’s solution step by step.
At each step, SLM generates multiple solutions. The search algorithm uses a reward model to evaluate the responses and select a subset that is worthy of further evaluation. This process is repeated until the model runs out of its budget or reaches the correct solution. In this way, the display budget can be reduced to ensure the most reliable solutions.
The researchers found that while beam search improves the performance of this model on complex problems, it tends to interfere with other methods on simple problems. To solve this problem, they added two more things to their proposal.
The first was Diverse Verifier Tree Search (DVTS), a type of tree search that ensures that SLM does not stick to false paths and differentiates its branches. Second, they developed a “super-optimized method,” as described in the DeepMind paper, which selects the best method for testing time based on the complexity of the input problem.
The combination of these techniques allowed the Llama-3.2 1B to punch above its weight and outperform the 8B model by a margin. They also found the method to be dangerous, and when applied to the Llama-3.2 3B, they were able to outperform the larger version of the 70B.
Calculating the trial period changes the value of the model. Businesses now have the ability to choose where to allocate their computing resources. For example, if you have memory problems or are slow to respond, you can use a smaller sample size and spend more time getting the correct answers.
However, extending the trial period also has its limitations. For example, in an experiment conducted by Hugging Face, researchers used a specially trained Llama-3.1-8B model as a PRM, which required two identical models (although it was more efficient than the 70B). The researchers agree that the best way to measure test time is to have “self-validation,” where the original sample determines its answer rather than relying on an external verifier. This is an open research area.
The method of measuring test time presented in this study also has problems where the answer can be evaluated clearly, such as writing and arithmetic. Developing payment models and proofs for formal jobs such as creative writing and content design requires further research.
But what is clear is that the trial period has produced interest and many jobs and we can expect more tools and methods to appear in the coming months. Businesses will be wise to look at how the site is growing.