OpenAI confirms the new frontier models o3 and o3-mini

Subscribe to our daily and weekly newsletters for the latest updates and content from the industry’s leading AI site. learn more

OpenAI is gradually inviting selected users to test new models of the concept called o3 and o3 mini, the successors to the o1 and o1-mini models that have been around for a long time. went into full release earlier this month.

OpenAI o3, named to avoid copyright issues with phone company O2 and because CEO Sam Altman says the company “has a tendency to do bad things with names,” was announced on the final day of “12 Days of OpenAI” today.

Altman said the two new models will be released first to third-party researchers security testingand the o3-mini is expected by the end of January 2025 and the o3 “shortly thereafter.”

“We see this as the beginning of the next phase of AI, where you can use these models to do very complex tasks that require a lot of thinking,” Altman said. “On the last day of this event we thought it would be fun to go from one type to another type of border.”

The announcement comes just a day after Google unveiled and let people use it a new version of Gemini 2.0 Flash Thinkinganother controversial example “thinking” which, unlike the series OpenAI o1, allows users to see the steps in its “thinking” process written in bullet books.

The release of Gemini 2.0 Flash Thinking and now the announcement of o3 shows that the competition between OpenAI and Google, and the main part of AI model providers, is entering a new and powerful phase when they offer not only LLMs or multimodal, but advanced ones. conceptual models as well. This can be widely used for complex problems in science, mathematics, engineering, physics and many others.

Best performance on third-party benchmarks yet

Altman also said that the o3 model is “amazing for writing,” and benchmarks that OpenAI supported, showing the model outperformed o1’s software performance.

• Special Coding Practice: o3 outperforms o1 by 22.8 percent on SWE-Bench Verified and achieves 2727 Codeforces, surpassing OpenAI’s Chief Scientist score of 2665.

• Advantages of Math and Science: o3 scored 96.7% on the AIME 2024 exam, missing only one question, and scored 87.7% on the GPQA Diamond, surpassing the ability of human experts.

• Frontier Symbols: This model sets new records on difficult tests like EpochAI’s Frontier Math, solving 25.2% of problems while no other model exceeds 2%. On the ARC-AGI test, the o3 score multiplied by three o1 is more than 85% (as determined by the ARC Award team), which represents a high level of reasoning.

Deliberate preparation

Along with this, OpenAI strengthened its commitment to security and communication.

The company launched a new study of intentional communicationa process that helps make o1 the most durable and cohesive to date.

This approach embeds human-written security principles into models, forcing them to think logically about these principles before providing answers.

This approach aims to address common security challenges in LLMs, such as vulnerability to prison threats and excessive resistance, by providing them with examples and chain-of-ghad (CoT) concepts. This method allows models to recall and use security information quickly during the description.

Intentional alignment is an improvement over previous approaches such as reinforcement learning from human feedback (RLHF) and AI rulemaking, which rely on security principles by generating labels instead of embedding the principles into the models.

By fine-tuning LLMs for safety-related data and their explanations, this approach creates models that can make policy-driven decisions without relying heavily on human-authored data.

Results shared by OpenAI researchers in a new, peer-reviewed paper shows that this approach improves performance on safety benchmarks, reduces harmful outcomes, and ensures compliance with regulatory requirements.

The findings show the progress of the o1 model over its predecessors such as GPT-4o and other advanced models. Intentional alignment helps the o1 series perform better in resisting prison breaks with a better finish and less resistance to overshooting. In addition, the method supports the slow expansion, showing robustness in many languages and jailbreaks. This change is in line with OpenAI’s goal of making AI systems safer and more intuitive as they grow.

This research has also played a major role in aligning the o3 and o3-mini, ensuring that their capabilities are robust and reliable.

How to register for o3 and o3-mini exams

Early access applications are now open at OpenAI website and will close on January 10, 2025.

Applicants must register online appearance which asks them for a variety of information, including research, past events, and links to papers that have already been published and their databases on Github, and to decide which version – o3 or o3-mini – they want to try, as well as what they have made to use it.

Selected researchers will be given access to o3 and o3-mini to explore their potential and help with security assessments, although the OpenAI feature warns that o3 will not be available for several weeks.

Researchers are encouraged to develop rigorous evaluations, conduct well-controlled demonstrations of high potential, and test models in situations that are not possible with widely used tools.

This is in line with what the company has implemented, including rigorous internal security testing, partnerships with organizations like the US and UK AI Safety Institutes, and its Preparedness Framework.

OpenAI will review applications on a regular basis, and select a starting point immediately.

A new leap forward?

The launch of o3 and o3-mini represents a leap forward in AI performance, especially in areas that require advanced reasoning and problem-solving skills.

With their unique effects on coding, math, and logic, these models represent a rapid advance in AI research.

By inviting many researchers to collaborate on security testing, OpenAI aims to ensure that this is delivered effectively.

Watch the stream below:

Daily thoughts on business use cases by VB Daily

If you want to impress your boss, VB Daily has you covered. We provide you with the inside scoop on what companies are doing with AI output, from regulatory changes to practical solutions, so you can share insights for high ROI.

Read our Privacy Policy

Thank you for subscribing. See more VB articles here.

There was a problem.

Source link

OpenAI confirms the new frontier models o3 and o3-mini

Best performance on third-party benchmarks yet

Deliberate preparation

How to register for o3 and o3-mini exams

A new leap forward?

Leave a ReplyCancel Reply

Manchester City on the verge of signing the wolves Rayan Ait-Nouri-Ceremonies

I made Google interprets the unchanged data before the iPhone and resulted for me more than once

MBAPPE motionless triumph of the Champions League PSG

Best performance on third-party benchmarks yet

Deliberate preparation

How to register for o3 and o3-mini exams

A new leap forward?

Leave a ReplyCancel Reply

Trending now

Manchester City on the verge of signing the wolves Rayan Ait-Nouri-Ceremonies

I made Google interprets the unchanged data before the iPhone and resulted for me more than once

MBAPPE motionless triumph of the Champions League PSG