Science Benchmark Test

Medical AI tools are growing, but are they being tested properly?

Artificial intelligence algorithms are being built into almost all aspects of health care. They’re integrated into breast cancer screenings, clinical note-taking, health insurance management and even ...

Mint

OpenAI introduces FrontierScience to test AI’s expert-level scientific reasoning across physics, chemistry, biology

OpenAI on December 16 announced FrontierScience, a new benchmark designed to evaluate artificial intelligence systems on expert-level scientific reasoning across physics, chemistry and biology, as AI ...

Science News

AI’s understanding and reasoning skills can’t be assessed by current tests

“Sparks of artificial general intelligence,” “near-human levels of comprehension,” “top-tier reasoning capacities.” All of these phrases have been used to describe large language models, which drive ...

insideHPC

MLCommons Releases MLPerf Inference v5.1 Benchmark Results

Today, MLCommons announced new results for its MLPerf Inference v5.1 benchmark suite, tracking the momentum of the AI community and its new capabilities, models, and hardware and software systems. To ...

TechRepublic

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims

OpenAI’s o3: AI Benchmark Discrepancy Reveals Gaps in Performance Claims Your email has been sent The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems. Find out ...

WLRN

Students' scores on Florida tests show benchmark improvements. National indicators aren't as promising

Florida students did better on their state benchmark tests this year. But one critic said these tests are not an accurate indicator of how students are — or aren't — improving. Students take Florida ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results