DeepSWE puts GPT-5.5 atop the AI coding leaderboard while raising new questions about Claude Opus, SWE-Bench Pro, and benchmark leakage.
AI coding agents boost code output by 180% but shipping rises only 30%, MIT finds. Why private data access beats benchmark ...
Z.ai has released its latest language model, GLM-5.2, which reportedly surpasses GPT-5.5 on various long-horizon coding ...
AI thrives on data but feeding it the right data is harder than it seems. As enterprises scale their AI initiatives, they face the challenge of managing diverse data pipelines, ensuring proximity to ...
Most AI coding benchmarks still ask the question: did the agent produce code that passes the current tests? This is a useful question, but it is too narrow. Software development is iterative.
Google has once again updated its “Android Bench” rankings for the best AI models for Android app development, with a bunch of new “open-weight” models as well as more details on the tokens used and ...
Z.ai’s GLM-5.2 is an open-source model aimed at long-context coding-agent workflows, with support for a one million-token ...
For Android app developers relying on AI to code, picking the right model can be tricky. Not all models are built the same, and many are not specifically trained for Android development workflows. To ...
By Daniel Lewis, CEO, LegalOn. Foundation models are improving quickly. One useful measure is software engineering: the ...
Tom Fenton benchmarks the Lenovo ThinkPad T1g Gen 8 across SPECworkstation 4, Geekbench AI and Ollama tests to assess its performance for office workloads, local AI and large language models.
Developers are navigating confusing gaps between expectation and reality. So are the rest of us. Depending who you ask, AI-powered coding is either giving software developers an unprecedented ...