ChatGPT’s latest model may be a regression in performance

By Andrew Tarantola Published November 21, 2024

chatGPT on a phone on an encyclopedia — Shantanu Kumar / Pexels

According to a new report from Artificial Analysis, OpenAI’s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art model’s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. “The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability,” OpenAI wrote on X. “It’s also better at working with uploaded files, providing deeper insights & more thorough responses.” Whether those claims continue to hold up is now being cast in doubt.

Recommended Videos

“We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o,” the Artificial Analysis announced via an X post on Thursday, noting that the model’s Artificial Analysis Quality Index decreased from 77 to 71 (and is now equal to that of GPT-4o mini).

What’s more, GPT-4o’s performance on the GPQA Diamond benchmark decreased from 51% to 39% while its MATH benchmarks decreased from 78% to 69%.

Simultaneously, the researchers discovered more than a doubling in the speed increase of the model’s responses, accelerating from around 80 output tokens per second to roughly 180 tokens/s. “We have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference,” the researchers wrote.

Wait – is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):
➤… pic.twitter.com/gjY2pBFuUv

— Artificial Analysis (@ArtificialAnlys) November 21, 2024

“Based on this data, we conclude that it is likely that OpenAI’s Nov 20th GPT-4o model is a smaller model than the August release,” they continued. “Given that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.”

GPT-4o was first released in May 2024 to surpass the existing GPT-3.5 and GPT-4 models. GPT-4o offers state-of-the-art benchmark results in voice, multilingual, and vision tasks, according to OpenAI, making it ideal for advanced applications like real-time translation and conversational AI.

Topics

Andrew Tarantola

Former Digital Trends Contributor

Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…

Computing

OpenAI might start watermarking ChatGPT images — but only for free users

OpenAI press image

Everyone has been talking about ChatGPT's new image-generation feature lately, and it seems the excitement isn't over yet. As always, people have been poking around inside the company's apps and this time, they've found mentions of a watermark feature for generated images.

Spotted by X user Tibor Blaho, the line of code image_gen_watermark_for_free seems to suggest that the feature would only slap watermarks on images generated by free users -- giving them yet another incentive to upgrade to a paid subscription.

Computing

Meta’s latest open source AI models challenge GPT, Gemini, and Claude

Meta AI widget on Home Screen.

Meta has announced the latest iteration of its open-source AI model family Llama 4, which the brand has developed while competition in the generative AI industry continues to intensify.

The new AI family includes four models, and Meta detailed Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth. Meta detailed on its AI website that the models were trained on “large amounts of unlabeled text, image, and video data.” This indicates that the models will have varied multimodal capabilities.

Computing

OpenAI adjusts AI roadmap for better GPT-5

OpenAI press image

OpenAI is reconfiguring its rollout plan for upcoming AI models. The company’s CEO, Sam Altman shared on social media on Friday that it will delay the launch of its GPT-5 large language model (LLM) in favor of some lighter reasoning models to release first.

The brand will now launch new o3 and o4-mini reasoning models in the coming weeks as an alternative to the GPT-5 launch fans were expecting. In this time, OpenAI will be smoothing out some issues in developing the LLM before a final rollout. The company hasn’t detailed a specific timeline, just indicating that GPT-5 should be available in the coming months.

nproxy.org