In late December, OpenAI unveiled o3 (we’re skipping o2 for o2.co.uk). The benchmarks presented indicate significant progress compared to o1 and current developments in AI. Dive into our newest article by our member Traffic Builders.
These three LLM benchmarks stand out at OpenAI o3
- The GPQA test, with multiple-choice questions that cannot be Googled, shows o3’s superior knowledge: 87% correct, compared to 34% for scientists outside their specialism and 81% within.
- Frontier Math, featuring extremely difficult math problems, saw o3 become the first AI to score above 2%, at 25%.
- Finally, on the $1 million prize ARC-AGI test, o3 beat both previous AIs and the basic human level, scoring 87.5%.

ARC Prize )
While there are caveats, these results suggest that AI barriers are falling faster than previously thought.

The introduction of models that think has shaken up the AI industry. Researchers speak with urgency about the arrival of superintelligent AI systems, a tidal wave of intelligence. Not in the distant future, but very soon.
Often people talk about AGI: Artificial General Intelligence. Systems that, although the definition is vague, surpass human experts. The availability of this intelligence on demand will change society drastically and rapidly.
I had Gemini 2.0 sketch and speak an epic sci-fi scenario about the possible rise of AGI in the form of OpenAI’s GPT-6. Got 30 minutes? It’s not that bad!
If you would like to know more about the newest trends in AI, and what how it translates into digital marketing, check out Traffic Builders website!