OpenAI’s GPT-4, a generative artificial intelligence (AI) model, has passed the Turing test, according to Ethereum co-founder Vitalik Buterin.
The Turing test is a nebulous benchmark for AI systems purported to determine how human-like a conversational model is. The term was coined on account of famed mathematician Alan Turing who proposed the test in 1950.
According to Turing, at the time, an AI system capable of generating text that fools humans into thinking they’re having a conversation with another human would demonstrate the capacity for “thought.”
Nearly 75 years later, the person largely credited with conceiving the world’s second most popular cryptocurrency has interpreted recent preprint research out of the University of California San Diego as indicating that a production model has finally passed the Turing test.
Researchers at the University of California San Diego recently published a preprint paper titled “People cannot distinguish GPT-4 from a human in a Turing test.” In it, they had approximately 500 human test subjects interact with humans and AI models in a blind test to determine whether they could figure out which was which.
According to the research, humans mistakenly determined that GPT-4 was a “human” 56% of the time. This means that a machine fooled humans into thinking it was one of them more often than not.
Vitalik Buterin’s take
According to Buterin, an AI system capable of fooling more than half of the humans it interacts with qualifies as passing the Turing test.
Buterin added:
“It means people’s ability to tell if it’s a human or a bot is basically a coin flip!”
Buterin qualified his statement by saying, “Ok not quite, because humans get guessed as humans 66% of the time vs 54% for bots, but a 12% difference is tiny; in any real-world setting that basically counts as passing.”
He also later added, in response to commentary on his original cast, that the Turing test is “by far the single most famous socially accepted milestone for ‘AI is serious shit now’. So it’s good to remind ourselves that the milestone has now been crossed.”
The Turing test
Artificial general intelligence (AGI) and the Turing test are not necessarily related, despite the two terminologies often being conflated. Turing formulated his test based on his mathematical acumen and predicted a scenario where AI could fool humans into thinking it was one of them through conversation.
It bears mention that the Turing test is an ephemeral construct with no true benchmark or technical basis. There is no scientific consensus as to whether machines are capable of “thought” as living organisms are or as to how such a feat would be measured. Simply put, AGI or an AI’s ability to “think” isn’t currently measurable or defined by the scientific or engineering communities.
Turing made his conceptual predictions long before the advent of token-based artificial intelligence systems and the onset of generative adversarial networks, the precursor to today’s generative AI systems.
Artificial general intelligence
Complicating matters further is the idea of AGI, which is often associated with the Turing test. In scientific parlance, a “general intelligence” is one that should be capable of any intelligence-based feat. This precludes humans, as no person has shown “general” capabilities across the spectrum of human intellectual endeavor. Thus, it follows that an “artificial general intelligence” would feature thought capabilities far beyond that of any known human.
That being said, it’s clear that GPT-4 doesn’t fit the bill of true “general intelligence” in the strictly scientific sense. However, that hasn’t stopped denizens of the AI community from using the term “AGI” to indicate any AI system capable of fooling a significant number of humans.
In the current culture, it’s typical to see terms and phrases such as “AGI,” “humanlike,” and “passes the Turing test” to refer to any AI system that outputs content comparable to the content produced by humans.
Related: ‘We’re just scratching the surface’ of crypto and AI — Microsoft exec