Tony Blair Institute says AI is good for UK because ChatGPT said so

The Tony Blair Institute used ChatGPT to generate study information indicating that the United Kingdom could save billions of pounds, starting almost immediately, if it invests billions in automation. But questions over the study's veracity have been raised.
The Tony Blair Institute used ChatGPT to generate study information indicating that the United Kingdom could save billions of pounds, starting almost immediately, if it invests billions in automation. But questions over the study's veracity have been raised.

The Tony Blair Institute for Global Change (TBI), a nonprofit think tank, recently published research indicating that artificial intelligence could streamline the United Kingdom’s workforce, reduce government costs by billions, and automate more than 40% of worker tasks. 

According to the research, however, these benefits would require the government “to invest in AI technology, upgrade its data systems, train its workforce to use the new tools and cover any redundancy costs associated with early exits from the workforce.”

This would cost approximately $4 billion per year for the next five years and $7 billion per year after that, write the researchers.

But the real problem with the research, according to outside researchers who’ve read the paper, is in its reliance on ChatGPT.

Oxford University’s Mohammad Amir Anwar opined on X that the Tony Blair Institute was “making shit up”; meanwhile, the University of Washington’s Emily Bender told 404 Media’s Emanuel Maiberg that the researchers “might as well be shaking at Magic 8 ball and writing down the answers it displays.”

Source: Aaron Bastani

The problem

TBI researchers set out to provide a high-level overview of the entire workforce so that they could then predict what potential impact automation could have on the market going forward.

They determined that AI could save the UK billions of dollars almost immediately. Per the research paper, the costs of investment versus the potential savings “implies the net savings from fully utilising AI in the public sector to be nearly 1.3 per cent of GDP each year, equivalent to £37 billion a year in today’s terms.”

The researchers even go so far as to claim that “this equates to a benefit-cost ratio of 9:1 in aggregate” up front, and “after five years we estimate the programme could cumulatively save 0.5 per cent of annual GDP (or £15 billion in today’s terms), implying a benefit-cost ratio of 1.8:1 is possible if the technology is rolled out quickly.”

While those numbers are certainly exciting, it’s unclear if they have any actual meaning.

What’s in question is how the researchers came to their conclusions. Rather than conduct an exhaustive study with workers and employers to determine how automation would affect a given position, they used the O*Net data set to identify 20,000 tasks performed by workers and then fed the data to ChatGPT. The team then prompted the AI to determine what tasks were suitable for automation and what tools could be used to automate them.

According to the researchers, using human experts to go over each task would have made their work “intractable,” which in science means too difficult to perform.

This also means, ostensibly, that it would be “intractable” for the researchers to evaluate each of ChatGPT outputs — the team says they used the AI system to categorize nearly 20,000 tasks.

If we can assume the AI made mistakes (according to both the TBI research and ChatGPT maker OpenAI’s website, the models are prone to error), then we can also assume that the research contains faulty information and that peer review would be intractable as well.

A message on the ChatGPT prompt page indicating that the AI “can make mistakes” and advising users to “check important info.” Source: OpenAI

Automation isn’t easy

So, what’s the real number? Technically speaking, it wouldn’t be possible for ChatGPT to understand the nuances of automation on a task-by-task basis because the necessary data is almost completely unlikely to be in its data set due to the intractability of creating it by hand. 

When it comes to solving novel problems that an AI system hasn’t been trained on, generative systems tend to fail. 

For example, automatic coffee makers have existed for decades, but general automation — teaching an AI system to make coffee anywhere, in any room — is considered an outstanding problem in the fields of artificial intelligence and robotics.

Simply put, automation is difficult and requires a nuanced approach to each individual task.

Back in 2017, for example, as the generative AI frenzy began picking up steam, it was assumed that autonomous driving would be solved within a matter of years. Elon Musk even famously predicted that Tesla would operate 1 million robotaxis by the year 2020.

But, as of July 2024, the vast majority of automakers, startups and Big Tech outlets that were working on self-driving cars as of 2021 have shut down their respective programs. It turns out that 99% of driving is able to be automated, but so far, no engineering team has figured out how to safely automate the edge cases that make up that last 1%. 

While it’s easy to imagine any simple task being automated, context is important. ChatGPT may be capable of outputting text indicating that any job can be automated if you throw enough money at the problem, but the reality has so far proved to be antithetical to these claims.

Related: Intuit lays off 10% of staff to focus on AI