Artificial intelligence passes the Turing Test – ChatGPT Closer to humans than ever?

The media around the world has been abuzz with reports that artificial intelligence has passed the famous Turing Test, which would suggest it is becoming indistinguishable from a human. These claims stem from a recent study conducted by researchers at the University of California, San Diego, as reported by the Science Alert portal.
Details of the experiment
In March 2024, Cameron Jones and Benjamin Bergen, cognitive science specialists, published the results of their experiment, though it has not yet undergone official peer review. The study involved four large language models (LLMs): ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5. The test consisted of a simulated text conversation, in which participants engaged in parallel five-minute chats with two interlocutors—one human and the other a chatbot.
Which model performed Best?
The aim of the test was to determine whether participants could distinguish between a human and a machine based solely on conversation. The most convincing model turned out to be GPT-4.5, developed by OpenAI, which was mistakenly identified as a human in 73% of cases. The LLaMa-3.1-405B model fooled 56% of participants, while the remaining two models—ELIZA and GPT-4o—were less successful, with scores of 23% and 21% respectively.
What is the Turing Test?
The researchers’ experiment was based on the classic concept of the Turing Test, proposed by Alan Turing back in 1948. Turing devised this method to examine whether a machine could imitate human behavior, particularly in the realm of linguistic communication. The test quickly became one of the most well-known and controversial methods for evaluating machine intelligence.
Controversies surrounding the test
However, the test itself is subject to considerable controversy. Critics point out that the ability to mimic human conversation does not necessarily indicate true understanding—only the ability to convincingly fake it. Intelligence, after all, is not just about effective communication, but also about emotions, consciousness, intentions, and bodily experience. Furthermore, the experiment conducted by Jones and Bergen was very short—lasting only five minutes—which may be insufficient to fully assess the capabilities of a language model. The models were also specifically instructed to adopt certain personalities, which could have influenced the results.
Conclusions
Although the results indicate that GPT-4.5 effectively imitates a human, the researchers do not consider this definitive proof of machine intelligence equal to that of humans. Despite its popularity, the Turing Test remains a tool that raises more questions than it answers about the true nature of artificial intelligence.