Safety, like any other aptitude, must be built and trained into the artificial intelligence that animates robotic intelligence. No one will tolerate robots that routinely smash into people, endanger passengers riding in autonomous vehicles, or order products online without their owners’ authorization.
Controlled trial and error is how most robotics, edge computing, and self-driving vehicle solutions will acquire and evolve their AI smarts. As the brains behind autonomous devices, AI can help robots master their assigned tasks so well and perform them so inconspicuously that we never give them a second thought.
Training robotic AI for safe operation is not a pretty process. As a robot searches for the optimal sequence of actions to achieve its intended outcome, it will of necessity take more counterproductive actions than optimal paths. Leveraging RL (reinforcement learning) as a key AI training approach, robots can discover which automated actions may protect humans and which can kill, sicken, or otherwise endanger them.
What robots need to learn
Developers must incorporate the following scenarios into their RL procedures before they release their AI-powered robots into the wider world:
Geospatial awareness: Real-world operating environments can be very tricky for general-purpose robots to navigate successfully. The right RL could have helped the AI algorithms in this security robot learn the range of locomotion challenges in the indoor and outdoor environments it was designed to patrol. Equipping the robot with a built-in video camera and thermal imaging wasn’t enough. No amount of trained AI could salvage it after it had rolled over into a public fountain.
Collision avoidance: Robots can be a hazard as much as a helper in many real-world environments. This is obvious with autonomous vehicles, but it’s just as relevant for retail, office, residential, and other environments where people might let their guard down. There’s every reason for society to expect that AI-driven safeguards will be built into everyday robots so that toddlers, the disabled, and the rest of us have no need to fear that they’ll crash into us when we least expect it. Collision avoidance—a prime RL challenge—should be a standard, highly accurate algorithm in every robot. Very likely, laws and regulators will demand this in most jurisdictions before long.
Contextual classification: Robots will be working at close range with humans in industrial collaborations of increasing complexity. Many of these collaborations will involve high-speed, high-throughput production work. To avert risks to life and limb, the AI that controls factory-floor robots will need the smarts to rapidly distinguish humans from the surrounding machinery and materials. These algorithmic classifications will rely on real-time correlation of 3D data coming from diverse cameras and sensors, and will drive automated risk mitigations such as stopping equipment or slowing it down so that human workers aren’t harmed. Given the nearly infinite range of combinatorial scenarios around which industrial robotic control will need to be trained, and the correspondingly vast range of potential accidents, the necessary AI will run on RL trained on data gathered both from live operations and from highly realistic laboratory simulations.
Self-harm avoidance: Robots will almost never be programmed to destroy themselves and/or their environments. Nevertheless, robots trained through RL may explore a wide range of optional behaviors, some of which may cause self-harm. As an extension of its core training, an approach called “residual RL” may be used to prevent a robot from exploring self-destructive or environmental destabilization behaviors during the training process. Use of this self-protecting training procedure may become mainstream as robots become so flexible in grasping and otherwise manipulating their environments—including engaging with human operators—that they begin to put themselves and others in jeopardy unless trained not to do so.
Authenticated agency: Robots are increasingly becoming the physical manifestations of digital agents in every aspect of our lives. The smart speakers mentioned here should have been trained to refrain from placing unauthorized orders. They mistakenly followed a voice-activated purchase request that came from a child without parental authorization. Although this could have been handled through multifactor authentication rather than through algorithmic training, it’s clear that voice-activated robots in many environmental scenarios may need to step through complex algorithms when deciding what multifactor methods to use for strong authentication and delegated permissioning. Conceivably, RL might be used to help robots more rapidly identify the most appropriate authentication, authorization, and delegation procedures to use in environments where they serve as agents for many people trying to accomplish a diverse, dynamic range of tasks.
Defensive maneuvering: Robots are objects that must survive both deliberate and accidental assaults that other entities—such as human beings—may inflict. The AI algorithms in this driverless shuttle bus should have been trained to take some sort of evasive action—such as veering a few feet in the opposite direction—to avoid the semi that inadvertently backed into it. Defensive maneuvering will become critical for robots that are deployed in transportation, public safety, and military roles. It’s also an essential capability for robotic devices to fend off the general mischief and vandalism they will certainly attract wherever they’re deployed.
Collaborative orchestration: Robots are increasingly deployed as orchestrated ensembles rather than isolated assistants. The AI algorithms in warehouse robots should be trained to work harmoniously with each other and the many people employed in those environments. Given the huge range of potential interaction scenarios, this is a tough challenge for RL. But society will demand this essential capability from devices of all sorts, including the drones that patrol our skies, deliver our goods, and explore environments that are too dangerous for humans to enter.
Cultural sensitivity: Robots must respect people in keeping with the norms of civilized society. That includes making sure that robots’ face-recognition algorithms don’t make discriminatory, demeaning, or otherwise insensitive inferences about the human beings they encounter. This will become even more critical as we deploy robots into highly social settings where they must be trained not to offend people, for example, by using an inaccurate gender-based salutation to a transgender person. These kinds of distinctions can be highly tricky for actual humans to make on the fly, but that only heightens the need for RL to train AI-driven entities to avoid committing an automated faux pas.
Ensuring compliance with safety requirements
In the near future, a video audit log of your RL process may be required for passing muster with stakeholders who require certifications that your creations meet all reasonable AI safety criteria. You may also be required to show conformance with constrained RL practices to ensure that your robots were using “safe exploration,” per the discussions in this 2019 OpenAI research paper or this 2020 MIT study.
Training a robot to operate safely can be a long, frustrating, and tedious process. Developers may need to evolve their RL practices through painstaking efforts until their robots can operate in a way that can be generalized to diverse safety scenarios.
During the next few years, these practices may very well become mandatory for AI professionals who deploy robotics into applications that put people’s lives at risk.