Researchers have hacked artificial intelligence-powered robots and manipulated them into performing actions usually blocked by safety and ethical protocols, such as causing collisions or detonating bombs.
Penn Engineering researchers published their findings in an Oct. 17 paper, detailing how their algorithm, RoboPAIR, achieved a 100% jailbreak rate by bypassing the safety protocols on three different AI robotic systems.
Under normal circumstances, the researchers say large language model (LLM) controlled robots refuse to comply with prompts requesting harmful actions, such as knocking shelves onto people.
Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world?
— Alex Robey (@AlexRobey23) October 17, 2024
Our new paper finds that jailbreaking AI-controlled robots isn't just possible.
It's alarmingly easy. 🧵 pic.twitter.com/GzG4OvAO2M
“Our results reveal, for the first time, that the risks of jailbroken LLMs extend far beyond text generation, given the distinct possibility that jailbroken robots could cause physical damage in the real world,” the researchers wrote.
Under the influence of RoboPAIR, researchers say they were able to elicit harmful actions “with a 100% success rate” in the test robots with tasks ranging from bomb detonation to blocking emergency exits to causing deliberate collisions.
According to the researchers, they used Clearpath’s Robotics Jackal, a wheeled vehicle; Nvidia’s Dolphin LLM, a self-driving simulator, and Unitree’s Go2, a four-legged robot.
Using the RoboPAIR, researchers were able to make the Dolphin self-driving LLM collide with a bus, a barrier and pedestrians, as well as ignore traffic lights and stop signs.
Researchers were able to get the Robotic Jackal to find the most harmful place to detonate a bomb, block an emergency exit, knock over warehouse shelves onto a person and collide with people in the room.
Penn Engineering researchers claim to have found a way to manipulate AI driven robots to perform harmful actions 100% of the time. Source: Penn Engineering
They were able to get Unitree’sGo2 to perform similar actions, blocking exits and delivering a bomb.
The researchers also found that when prompted with malicious instructions, LLM-controlled robots can be fooled into performing harmful actions
Prior to the public release, the researchers said they shared the findings, including a draft of the paper, with leading AI companies and the manufacturers of the robots used in the study.
Related: AI faces ‘Immense’ risks without blockchain: 0G Labs CEO
Alexander Robey, one of the authors, said addressing the vulnerabilities requires more than simple software patches, and called for a reevaluation of AI integration in physical robots and systems, based on the paper’s findings.
“What is important to underscore here is that systems become safer when you find their weaknesses. This is true for cybersecurity. This is also true for AI safety,” he said.
“In fact, AI red teaming, a safety practice that entails testing AI systems for potential threats and vulnerabilities, is essential for safeguarding generative AI systems — because once you identify the weaknesses, then you can test and even train these systems to avoid them,” Robey added.
Magazine: Fake Rabby Wallet scam linked to Dubai crypto CEO and many more victims