Programming Language Benchmarks

Artificial Analysis overhauls its AI Intelligence Index, replacing popular benchmarks with 'real-world' tests

Artificial Analysis overhauls its AI Intelligence Index, replacing saturated benchmarks with real-world tests measuring ...

Another Chinese quant fund joins DeepSeek in AI race with model rivalling GPT-5.1, Claude

Beijing-based Ubiquant launches code-focused systems claiming benchmark wins over US peers despite using far fewer parameters ...

10d

Z.ai Releases GLM-4.7 Designed for Real-World Development Environments, Cementing Itself as "China's OpenAI"

On December 22, Z.ai released GLM-4.7, the latest iteration of its GLM large language model family. Designed to handle ...

13d

Z.ai Open-Sources GLM-4.7, a New Generation Large Language Model Built for Real Development Workflows

Z.ai released GLM-4.7 ahead of Christmas, marking the latest iteration of its GLM large language model family. As open-source models move beyond chat-based applications and into production ...

13d

MiniMax Unveils M2.1 to Bring Multilingual Programming Gains to Open AI Models

Chinese AI startup’s release is a major update to its open-source model series, aimed at multi-language programming and ...

14d

MiniMax releases M2.1 AI model for multi-language programming versatility

MiniMax M2 was released in late October this year. The company stated that M2.1 demonstrated significant improvements in ...

EurekAlert!

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

Morningstar

Logical Intelligence Achieves 76 Percent on Putnam Benchmark, Highlighting Shift Beyond Large Language Models to Language-free, Mathematically Grounded Models

Over the last decade, artificial intelligence (AI) has been largely built around large language models (LLMs). These systems are based on a language and guess words in a chain in the form of tokens.

9to5Mac

Show inaccessible results