The questions asked by researchers required the LLM to take a number of steps to ascertain information for answer.

The LLMs didnt do very well.

In the meantime, the average success rate for human respondents is 92%.

Yann LeCun, Vice President and Chief AI Scientist for Meta Platforms, testifies before the U.S. Senate Intelligence Committee at the Hart Senate Office Building on September 19, 2023 in Washington, DC

Yann LeCun, Vice President and Chief AI Scientist for Meta Platforms, testifies before the U.S. Senate Intelligence Committee at the Hart Senate Office Building on September 19, 2023 in Washington, DCPhoto: Kevin Dietsch (Getty Images)

News from the future, delivered to your present.

Why the Hell Is OpenAI Building an X Clone?

OpenAI is reportedly planning on making a social media platform because content to train on ain’t cheap.

DeepSeek iPhone App

OpenAI CEO Sam Altman wearing sunglasses and holding up the peace sign while driving a golf cart.

Sam Altman being interviewed by Fox Business

Chatgpt Voice Chat

Tinder The Game Game App Hero 1

Sharks

Animaid The Art Of Animation

Mon Mothma Genevieve O’reilly Tony Gilroy Andor Lucasfilm

Predator 1987

Padmecover

Olo

Silver Surfer Fantastic Four Marvel Studios

ealth and Human Services Secretary Robert F. Kennedy Jr. speaks at a news conference at the Health and Human Services Department on April 22, 2025 in Washington, DC.

Sharks

Animaid The Art Of Animation

Mon Mothma Genevieve O’reilly Tony Gilroy Andor Lucasfilm

Predator 1987

An image of a small disposable vape with a green case and mouth piece and visible oil in a clear container.

An image of a hand holding a black vape with a vibrant blue chamber where you can faintly see a laser.

Framework 13 Laptop 1 Hero

Samsung Odyssey 3d 6