Thursday, June 25, 2026
HomeTechnologyMeta exec denies the company artificially boosted Llama 4's benchmark scores

Meta exec denies the company artificially boosted Llama 4’s benchmark scores


A Meta exec on Monday denied a rumor that the company trained its new AI models to present well on specific benchmarks while concealing the modelsโ€™ weaknesses.

The executive, Ahmad Al-Dahle, VP of generative AI at Meta, said in a post on X that itโ€™s โ€œsimply not trueโ€ that Meta trained its Llama 4 Maverick and Llama 4 Scout models on โ€œtest sets.โ€ In AI benchmarks, test sets are collections of data used to evaluate the performance of a model after itโ€™s been trained. Training on a test set could misleadingly inflate a modelโ€™s benchmark scores, making the model appear more capable than it actually is.

Over the weekend, an unsubstantiated rumor that Meta artificially boosted its new modelsโ€™ benchmark results began circulating on X and Reddit. The rumor appears to have originated from a post on a Chinese social media site from a user claiming to have resigned from Meta in protest over the companyโ€™s benchmarking practices.

Reports that Maverick and Scout perform poorly on certain tasks fueled the rumor, as did Metaโ€™s decision to use an experimental, unreleased version of Maverick to achieve better scores on the benchmark LM Arena. Researchers on X haveย observed starkย differences in the behaviorย of the publicly downloadable Maverick compared with the model hosted on LM Arena.ย 

Al-Dahle acknowledged that some users are seeing โ€œmixed qualityโ€ from Maverick and Scout across the different cloud providers hosting the models.

โ€œSince we dropped the models as soon as they were ready, we expect itโ€™ll take several days for all the public implementations to get dialed in,โ€ Al-Dahle said. โ€œWeโ€™ll keep working through our bug fixes and onboarding partners.โ€



Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments

Translate ยป