Will humanoid robots move beyond the confines of laboratories, factories, and warehouses to conquer the external world? A recent work by the University of California appears to have discovered, through an innovative reinforcement learning model, the key to this transition.

In the December 2017 issue of Science Robotics, within the editorial “Humanoid robotics – History, current state of the art, and challenges“, Toshio Fukuda, Professor of Intelligent Robotics and AI at Nagoya University in Japan, Paolo Dario, then Director of the Institute of BioRobotics at the Scuola Superiore Sant’Anna in Pisa, and Guang-Zhong Yang, Director of the Hamlyn Center for Robotic Surgery at Imperial College London, characterised humanoid robots as “inherently interdisciplinary“.

«They considered them emblematic of one of the ultimate objectives of robotics»: the integration of all advancements from the diverse disciplines engaged in the research and development activities focused on these anthropomorphic machines. The authors underscored fields such as «advanced locomotion and manipulation, biomechanics, artificial intelligence, computer vision, perceptual studies, behavioural research, and cognitive learning».

Among these disciplines, the challenge of autonomous locomotion in complex external environments – «beyond the confined spaces of laboratories» – was identified nearly six and a half years ago as a domain still replete «with unresolved research issues concerning humanoid robots».

Despite numerous global academic endeavours to address these challenges, problems related to stability and dynamic control of humanoid robot locomotion across natural terrains persist.

Currently, techniques utilising Reinforcement Learning are at the forefront of addressing challenges that compromise the walking capabilities of humanoid robots interacting with the friction of natural surfaces such as concrete, grass, or rubber.
Researchers at the University of California, Berkeley, have devised a predictive control system for robotic locomotion that utilises reinforcement learning algorithms. This system was initially tested on a humanoid robot without onboard cameras in an indoor environment and subsequently in outdoor conditions.
Looking to the future, humanoid robots proficient in navigating and moving in challenging outdoor environments could collaborate with humans in external operations, such as emergency responses to fires and earthquakes, thereby safeguarding both people and settings.

Humanoid mobility in contact with natural ground friction: key challenges

We could enumerate humanoid robots that have gained fame over the years, endowed with capabilities ranging from cognitive and interactive skills to language, visual, auditory, and tactile perceptions, and even autonomous obstacle-avoidance walking. Starting with Japan’s ASIMO (Advanced Step in Innovative Mobility), introduced in 2000, followed by Italy’s iCub – launched in 2004 by the Italian Institute of Technology – and France’s Pepper from 2014, these are just a few of the most notable examples.

A common feature among these robots, and others like them, is their indoor operation: in laboratories, homes, factories, warehouses, restaurants, or stores. This is because «facilitating humanoid mobility on surfaces other than indoor floors, along with full-body control for trajectory tracking, poses a significant challenge. The difficulty stems from the need to leverage contact forces to manage leg locomotion, whilst adhering to the friction constraints of natural terrain» [source: “Versatile Locomotion Planning and Control for Humanoid Robots” – National Library of Medicine, 2021].

More recently, a study by the Massachusetts Institute of Technology (MIT) – “The MIT Humanoid Robot: Design, Motion Planning, and Control for Acrobatic Behaviors” (2021) – presents an acrobatic humanoid robot modelcapable of executing flips and somersaults thanks to innovative hardware design and motion planning involving two new proprioceptive actuators developed by MIT.

These actuators allow the machine to orient itself in space without visual aids. However, this only reached the stage of a dynamic and realistic simulation, far removed from practical applications for real humanoid robots in natural settings.

Image about the simulation of a backward somersault performed by the humanoid robot model developed by MIT in 2021 (Credit: “The MIT Humanoid Robot: Design, Motion Planning, and Control For Acrobatic Behaviors” - https://arxiv.org/pdf/2104.09025).
Simulation of a backward somersault performed by the humanoid robot model developed by MIT in 2021 (source: “The MIT Humanoid Robot: Design, Motion Planning, and Control For Acrobatic Behaviors” – https://arxiv.org/pdf/2104.09025).

Humanoid robot locomotion: learning-based methods

In recent years, despite progress in the stability of outdoor humanoid locomotion through hardware optimization and motion planning, it has been learning-based studies that have revitalized this area of research, focusing on the operational context of the robots.

Notably, in 2021, the Collaborative Robotics and Intelligent Systems Institute at Oregon State University conducted a study titled “Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning”. This research addressed a pivotal issue in robotic locomotion in natural environments, specifically the accurate estimation of space needed for navigation, by examining the behaviour of a bipedal robot ascending and descending stairs without any visual perception, as though it were blind. The technique employed was reinforcement learning (RL), a machine learning strategy that enables a machine to engage in reasoned decision-making to select precise actions needed to achieve specific objectives through interaction with its environment.

This approach, as the team elucidated, facilitated the development of a control mechanism through the learning of proprioceptive reflexes (enabling it to discern its spatial position without sight) and «strategies for navigating rough terrains following exposure to various disturbances during the RL model training».

Nonetheless, this research was restricted to small-sized bipedal robots (distinct from humanoid-like robots) and did not extend to the practical implementation of locomotion on natural terrains. The reinforcement learning method was more tested than applied in the real world, yet it set a new course for future exploration.

Reinforcement learning propels outdoor humanoid robot locomotion

Researchers at the University of California, Berkeley, have recently moved beyond conventional reinforcement learning applications with basic bipedal robots, which are limited to autonomous walking within experimental confines.

Their paper, “Real-world Humanoid Locomotion with Reinforcement Learning“, published in Science Robotics on 17 April 2024, debuts a reinforcement learning model designed to govern humanoid locomotion outside the controlled laboratory settings.

They have coined their developed control system a “causal transformer“, a term that denotes a straightforward yet effective neural network architecture. This architecture distinctively «employs a comprehensive dataset of environmental observations and proprioceptive actions acquired during training to predict future actions in real-world settings, thus justifying the adjective “causal».

The authors indicate that the transformer was trained using a vast array of video data from numerous simulations of outdoor locomotion. A notable aspect of this study is its application of the reinforcement learning model on a full-sized humanoid robot – standing approximately 160 cm tall and weighing 45 kg -operating without an onboard vision system, in both indoor and outdoor settings, engaging directly with environmental elements.

Indoor experiments

The aim of the lab tests on the humanoid robot was to specifically gauge its stability and robustness against various external forces, differing surfaces, and assorted loads with varying mass and shapes. Here’s an in-depth examination.

Image about laboratory tests on the stability of the humanoid robot controlled by the Reinforcement Learning model developed by the University of California, addressing: A) external forces B) various surfaces for walking and (C) carrying loads of different mass and shape (credit: “Real-world humanoid locomotion with reinforcement learning” - https://www.science.org/doi/10.1126/scirobotics.adi9579).
Laboratory tests on the stability of the humanoid robot controlled by the Reinforcement Learning model developed by the University of California, addressing: A) external forces B) various surfaces for walking and (C) carrying loads of different mass and shape (credit: “Real-world humanoid locomotion with reinforcement learning” – https://www.science.org/doi/10.1126/scirobotics.adi9579).

Robustness to external forces

Researchers at the University of California tested whether their “blind” robot, managed by the reinforcement learning model, could withstand sudden external forces while ambulating.

This included impacts from a large ball, abrupt pushes with a wooden stick, and resisting motions that could hinder its path. The tests confirmed that the model’s control over the humanoid could maintain its stabilityacross each artificially constructed scenario, enabling it to «react swiftly and adjust its actions to avert falls».

Robustness to terrain variability

This capability was crucial, particularly on uneven terrain, and was tested by overlaying the laboratory floor with rubber, fabric, cables, and packing materials to simulate different roughness levels, which posed tripping, slipping, or falling hazards to the robot, which had to maintain a constant forward speed of 0.15 m/s. «Despite these challenges, our controller enabled the robot to navigate all types of surfaces successfully. Moreover, we assessed its performance on two different slopes, with simulations featuring inclines of up to 8.7%. The results demonstrated that the robot could manage both, showing greater robustness at higher speeds (0.2 m/s) on steeper inclines», the researchers noted.

Robustness with various loads

The robot’s ability to maintain stability while carrying different types of weights as it moved forward was another tested capability by the American team. The robot was tasked with transporting an empty backpack, a full backpack, a fabric bag, a loaded garbage bag, and a paper bag, all without any balance issues.

Particularly remarkable in this last test, as highlighted by the researchers, was the reinforcement learning-based controller’s ability to adapt to a full garbage bag attached to its arm, keeping it steady to prevent it from dropping, «despite the training involving oscillatory arm movements for body balance».

This is a prime example of behavioural adaptation to context without visual perception, encapsulating the core concept of Reinforcement Learning.

Field implementation

The outdoor trials involved the robot navigating a variety of settings such as squares, pedestrian crossings, sidewalks, running tracks, and grassy fields, traversing surfaces composed of diverse materials including concrete, tiles, rubber, grass, and wood, amid morning dampness and under the afternoon sun.

Image about outdoor implementation on various types of terrain by the humanoid robot controlled by the LR model from the University of California (credit: “Real-world humanoid locomotion with reinforcement learning” - https://www.science.org/doi/10.1126/scirobotics.adi9579).
Outdoor implementation on various types of terrain by the humanoid robot controlled by the LR model from the University of California (credit: “Real-world humanoid locomotion with reinforcement learning” – https://www.science.org/doi/10.1126/scirobotics.adi9579).

The team observed that the control system in question was trained exclusively with video data from simulations of movement in external environments. Consequently, the humanoid robot’s first real-world interactions were unprecedented. As the researchers highlighted, «the terrain properties encountered outdoors were absent from the model’s training data. Nonetheless, the controller successfully maintained stability across all the terrains tested».

During a week of comprehensive day-long outdoor tests, the robot did not suffer any falls. However, as previously mentioned, the controller is a causal transformer, relying entirely on historical data of environmental observations and proprioceptive responses gathered during training, and predicting subsequent actions based solely on this record, without any visual input from cameras. This occasionally led to collisions with objects or entrapment in obstacles such as steps. Nevertheless, «it always effectively adjusted its movements to the spatial features to prevent falls».

Glimpses of Futures

In the future, a predictive control system such as the one described – rooted in reinforcement learning algorithms – capable of enabling fluid, human-like ambulation outdoors for a humanoid robot even without “eyes,” could refine the behaviours and movements of humanoid robots in any terrain, no matter how challenging. This technology promises significant potential for human interaction.

To anticipate possible future scenarios, let us employ the STEPS matrix to explore the impacts that the evolution of the causal transformer for controlling humanoid locomotion beyond laboratory environments could have across various domains.

S – SOCIAL: in a future vista, humanoid machines increasingly sophisticated in cognitive, interactive, and perceptual capabilities, and ever more proficient in manoeuvring arms, hands, and legs, in walking and running, in ascending and descending stairs, even in the most remote and harsh open areas, could collaborate with humans – such as with Civil Protection or Law Enforcement – or even replace them in perilous operations. These operations could include outdoor activities where intervention in hazardous situations is required to protect people, animals, and environments during extreme events like fires and earthquakes. A human-sized robot, capable of using hands, arms, and legs in unison, just as a human would, while also withstanding external forces and carrying significant weights, could offer strategic support. Examples of “police robots” and “firefighter robots” – both quadrupedal and bipedal – already exist in the United States, Singapore, our own country, and globally, aiding Police Forces and Fire Brigades in external operations, though these are not humanoid (thus lacking human-like legs) and are motor-driven.

T – TECHNOLOGICAL: the neural network devised by the University of California team, central to the humanoid control system presented, features a straightforward architecture, devoid of the recurrences and convolutions seen in deeper, more complex neural networks which demand extensive data for training. This transformer model is «more scalable with additional data and computation, and also accommodates new input modalities». However, the future may demand a deeper neural network and other artificial intelligence techniques such as deep learning, to further refine the precision of trajectories, symmetry in leg movements, and the accuracy of their speed tracking for outdoor humanoid locomotion.

E – ECONOMIC: looking forward, as technologies evolve to control the locomotion of humanoid robots even in natural settings—enabling them to support professions conducted outdoors – nations will need to address the contentious issue of machine – replaced human labour. The Forrester Job Forecast 2020-2040 paints a bleak picture, predicting the loss of 12 million jobs by 2040 in sectors like retail, catering, and hospitality in countries including the United Kingdom, Germany, France, Italy, and Spain. Beyond Europe, the New York State Assembly, in 2023, proposed a “robot tax” on companies substituting employees with machines. As of April 2024, this bill remains mired in committee amidst vigorous debate and entrepreneurial dissent. The concept, initially introduced by Bill Gates in 2017 as a provocative suggestion, has been discussed in the EU, including Italy, but debates about the tax’s structure and implementation remain largely academic.

P – POLITICAL: the advent of predictive control systems based on reinforcement learning algorithms, enabling humanoid mobility in external settings, has refocused attention on the safety of autonomous machines interacting with humans in workplaces. Notable incidents include the 2021 assault on a Tesla engineer in Texas by a humanoid robot and a fatal accident in a South Korean vegetable packing plant in November 2023, where a robot caused a worker’s death. Regulations like those replacing the Machinery Directive 2006/42/EC with Regulation 2023/1230, enforceable from January 20, 2027, impose new safety standards and protection requirements for human-machine collaboration. It is imperative that companies not only adhere to these regulations but also manage their robotic workforce responsibly to mitigate legal liabilities potentially arising from mismanagement leading to disastrous events.

S – SUSTAINABILITY: as reinforcement learning-based control systems evolve, humanoid robots are set to expand their presence outside labs, homes, factories, warehouses, restaurants, and shops. It is critical to consider the environmental impact of artificial intelligence, the core “brain” of these machines. The impact is twofold, as highlighted in “Artificial Intelligence for Carbon Emissions Using System of Systems Theory” (Ecological Informatics, September 2023), which illustrates the paradox of AI as both a potent tool against climate change and a contributor to carbon emissions: «one cannot deny the fact that artificial intelligence represents an effective tool in combating climate change. But neither can its role in contributing to carbon emissions be ignored». The emissions vary with the AI techniques employed, making it crucial to focus on sustainable AI practices that conserve power and reduce energy consumption throughout their operational life: «although deep learning may be necessary for some specific goals, simpler machine learning algorithms often lead to similar results, but with less power and computational energy».

Written by: