Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more
The end of the year 2024 has brought artificial intelligence statistics, as those in the industry fear the progress of super intelligent AI. But the OpenAI version of o3, announced last weekhas started a a new wave of excitement and controversyand suggests that major changes are still to come in 2025 and beyond.
This model, announced for safety testing among researchers, but not released publicly, found interesting results for the important ARC metric. The benchmark was created by François Chollet, a well-known AI researcher and developer of the Keras deep learning system, and is designed to measure the model’s ability to perform new, intelligent tasks. Therefore, it provides a clear overview of the progress of intelligent AI systems.
In particular, o3 scored 75.7% on the ARC benchmark under standard computing conditions and 87.5% using high-end computing, surpassing previous technical results, such as 53% scored with Claude 3.5.
This achievement of o3 represents a remarkable progress, according to Chollet, who was has been a critic The ability of major linguistic models (LLMs) to achieve such intelligence. It presents innovations that could accelerate the progress of advanced intelligence, whether we call it Artificial General Intelligence (AGI) or not.
AGI is a sarcastic, and vaguely defined term, but it conveys a purpose: an intelligence capable of adapting to problems or questions in ways that surpass human ability.
OpenAI’s o3 tackles certain barriers to conceptualization and flexibility that have traditionally characterized large-scale language models. At the same time, it reveals the challenges, including high costs and performance barriers that push these systems to their limits. This article will look at five innovations behind the o3 model, many of which are supported by advances in reinforcement learning (RL). It will take input from industry leaders, OpenAI goalsand above all Chollet’s critical analysisto explain what these developments mean for the future of AI as we enter 2025.
OpenAI’s o3 model introduces a new capability called “program synthesis,” which enables it to combine things it has learned during training — patterns, algorithms, or methods — into new configurations. These elements may include math, code snippets, or logical patterns that the model has encountered and developed over the course of her multidisciplinary studies. Most importantly, software integration allows o3 to tackle tasks they have not seen directly in training, such as solving coding problems or tackling storyboards that require thinking beyond applying what you have learned. François Chollet describes the synthesis of programs as the ability of a device to combine familiar ingredients in new ways – just as a chef creates a special dish using well-known ingredients. This marks a departure from previous models, which mainly take and use what has already been learned without revision – and it is what Chollet advocated a few months ago as the only way to get to better intelligence.
Central to the evolution of o3 is the use of Chains of Thought (CoTs) and a leading research method that takes place during the process of knowledge-when the model provides solutions in real or deployed situations. These CoTs are step-by-step natural language instructions that the model generates to search for answers. Guided by the evaluation model, o3 develops several solutions to problems and evaluates them to ensure the most reliable solution. This method reflects human problem solving, where we consider different options before choosing the right one. For example, in maths, o3 creates and evaluates alternatives to find correct answers. Competitors like Anthropic and Google have tried similar approaches, but OpenAI’s implementation sets a new standard.
O3 generates a number of responses during the concept phase, and evaluates each one with the help of an integrated analysis model to determine the most reliable solution. By training analysts on data written by experts, OpenAI ensures that o3 develops the ability to think through complex, multifaceted challenges. This feature enables the model to be a judge of its own reasoning, moving large samples of language so that it can “think” and not just respond.
One of the most impressive features of o3 is its ability to create Chains of Thought (CoTs) as tools for problem solving. Traditionally, CoTs have been used as step-by-step analytical methods to solve specific problems. OpenAI’s o3 expands on this concept by adding CoTs as building blocks, allowing the model to address new challenges with greater flexibility. Over time, these CoTs become formal records of problem-solving processes, as individuals document and manage their learning through experiences. This capability shows how o3 is pushing the boundaries of revolutionary thinking. According to OpenAI expert Nat McAleeseo3’s performance on invisible software challenges, such as obtaining CodeForces above 2700, shows the new use of CoTs to compete with top software developers. This estimate of 2700 places the model at the “Grandmaster” level, among competitors worldwide.
O3 supports a deep learning process during brainstorming to evaluate and develop possible solutions to complex problems. This involves developing a number of solutions to problems and using the methods learned during the course to see if they can be done. François Chollet and other experts have noted that this reliance on ‘direct evaluation’—where responses are measured based on internal metrics rather than measured in real-world situations—can reduce the model’s robustness when applied to unpredictable or commercial situations.
Additionally, o3’s reliance on datasets written by experts to train its evaluation model raises scalability concerns. While these datasets improve accuracy, they also require extensive human supervision, which can hinder system flexibility and increase costs. Chollet points out that this trade-off highlights the difficulty of expanding the concept beyond controlled benchmarks such as ARC-AGI.
Ultimately, this approach shows both the potential and limitations of combining deep learning methods with systematic problem solving. While o3’s innovation shows progress, it also confirms the challenges of building well-known AI systems.
OpenAI’s o3 model achieves impressive results but at a high computational cost, using millions of tokens for each operation – and this low-cost approach is the main problem of the model. François Chollet, Nat McAleese, and others are stressing concerns about the economic feasibility of such models, and stressing the need for new products that work well and are cheap.
The release of o3 has sparked interest in the AI community. Competitors such as Google with Gemini 2 and Chinese companies like DeepSeek 3 are also progressing, making direct comparisons difficult until these models are tested more.
Opinions on the o3 are divided: some praise its ability, while others cite the high cost and lack of transparency, meaning that its true value will be revealed in extensive testing. One of the biggest criticisms came from Denny Zhou of Google DeepMind, who openly criticized the model’s reliance on reinforcement learning (RL) and search algorithms. as a “dead end,” argues instead that a model must be able to learn to reason easy maintenance way.
Whether it represents the best way to drive innovation, for business, the latest evolution of o3 shows that AI will continue to transform industries, from customer service to scientific research, in the future.
Industry players will need time to digest what o3 has to offer here. For businesses affected by the high cost of o3, OpenAI’s release of the smaller “o3-mini” model offers an alternative. Although it leaves out the potential of the rest of the model, the o3-mini promises a more cost-effective way for businesses to try – keeping more information up-to-date while reducing the need to calculate test time.
It may take some time for businesses to get their hands on the o3 model. OpenAI says the o3-mini is expected to launch in late January. The full release of o3 will follow later, although the timelines depend on the feedback and information obtained during security testing. Private companies would be well advised to try. They will want to set up a model with their own data and use cases and see how it works.
But for now, they can use many other suitable models that are already out and tested well, including the o4 model and other competing models – many of which are already strong enough to create smart, interactive programs that deliver value. .
In fact, next year, we will be using two gears. The first is to meet the requirements from AI software, and to realize what brands can do with AI assistants, and other innovations that have already been implemented. The runner-up will be sitting down with popcorn and watching the intellectual competition unfold – and any progress will be icing on the cake already served.
For more information on o3 innovations, watch the full YouTube interview between me and Sam Witteveen below, and follow VentureBeat for more information on AI advancements.