[gtranslate]
News

OpenAI introduces FrontierScience; a gauge for AI’s readiness for Scientific Research

Let the robots do the thinkin’.

OpenAI has introduced a new benchmark called FrontierScience is setting a higher bar for evaluating artificial intelligence in expert-level scientific research.

Developed to measure deep reasoning rather than simple fact recall, it challenges AI systems with original, difficult problems across physics, chemistry, and biology.

openai chatgpt sam altman

The benchmark arrives as the most capable models, like GPT‑5, are already accelerating real scientific workflows, shortening tasks that once took weeks to hours.

FrontierScience features two tracks: an Olympiad-style set for constrained reasoning and a Research track simulating real-world scientific subtasks.

In initial evaluations, GPT‑5.2 leads other models, scoring 77% on Olympiad questions but only 25% on open-ended Research problems.

This gap highlights AI’s growing, yet incomplete, capacity to partner in science: adept at structured reasoning, but still requiring human insight for framing and validation.

By providing a rigorous “north star,” FrontierScience aims to steer AI development toward genuinely augmenting human discovery.