One major limitation to the scalability and generalisation of Reinforcement Learning models used in complex decision-making processes lies in their fragility when faced with changes in the environment. In this context, MIT researchers have proposed a method that builds upon traditional RL approaches, aiming to address context-specific RL challenges.
Over the past decade, there has been a significant surge in the study and development of increasingly autonomous AI agents capable of making decisions across a range of tasks. Prominent examples of decision-making processes in AI systems include autonomous driving (navigating traffic or responding to road accidents), the medical field (primarily diagnostics), and the financial sector (investment recommendations or loan approvals), to name just a few.
AI is, therefore, increasingly stepping into the role of decision-maker. However, training machines to make decisions that are both accurate and safe – for users and the environments they operate in – is far from straightforward.
Consider, for instance, the use of artificial intelligence in urban traffic management. The potential benefits of balanced and reliable decisions by the system range from enabling quicker and smoother journeys for drivers to improving road safety and urban sustainability.
Yet, Reinforcement Learning models (RL) – the backbone of such decision-making processes – often struggle when faced with even minor variations in the tasks for which they were initially trained.
What does this mean in practical terms? Returning to the example of urban traffic management, an RL model might struggle to handle, say, an area with multiple intersections featuring varying speed limits or different numbers of lanes. Let’s delve into why this happens.
TAKEAWAYS
Reinforcement Learning and its role in decision-making processes
In the realm of AI research, Reinforcement Learning (RL) represents a machine learning technique that enables systems to “reason” and select specific actions to achieve desired objectives through interactions with their environment.
The paper “Decision-Making in Reinforcement Learning“, authored by the Department of Computer Science & Engineering at Gautam Buddha University, defines RL as the process of “learning what to do” «to maximise rewards and map situations to actions».
According to the authors, an RL agent does not inherently know which actions to perform. Instead, it discovers optimal actions by:
- exploration, testing different actions through trial and error, and
- exploitation, relying on its existing knowledge to make decisions based on past successful outcomes
While exploitation is suitable for maximising immediate, certain rewards, exploration provides an opportunity to identify potentially greater rewards in the long term.
This interplay between exploration and exploitation represents a core trade-off in Reinforcement Learning.For decades, mathematicians and researchers have sought ways to balance the two approaches, but, as the study group notes, no definitive solution has yet been found.
«In Reinforcement Learning, balancing exploration and exploitation is a critical challenge. An RL system must decide when to perform actions derived from present experiences, learning through trial and error, and when to rely on past experiences that have proven effective for achieving rewards».
This balance is severely tested in tasks characterised by variability and instability, such as managing traffic intersections with varying speed limits or different numbers of lanes. In these cases:
- exploration becomes problematic due to the unpredictable nature of the environment, and
- exploitation falters as the system lacks standardised data for dynamically shifting contexts
As a result, the RL model struggles to make decisions that are both safe and reliable, either for users or for the broader environment.
Training machines to perform sequences of interrelated tasks
The recent research conducted by the Massachusetts Institute of Technology (MIT), detailed in the paper “Model-Based Transfer Learning for Contextual Reinforcement Learning” and set to be officially presented at NeurIPS 2024 (Conference on Neural Information Processing Systems, 10–15 December 2024, Vancouver), introduces an innovative algorithm for training Reinforcement Learning (RL) models. This algorithm is specifically designed to handle complex decision-making processes influenced by variable, unordered, and unpredictable elements.
«Reinforcement learning has made significant strides in addressing decision-making challenges across various domains», notes the MIT research team. «However, despite these advancements, RL algorithms often exhibit vulnerabilities when exposed to minor variations, such as changes in the number of road lanes, weather conditions within the same region, or differences in traffic flow parameters. These variations drastically limit their scalability and generalisation capabilities».
The new algorithm aims to train AI agents to manage a sequence of interrelated tasks within a broader framework. Each “task” is treated as part of a cohesive set of activities, enabling the agent to complete them correctly in sequence.
In the context of traffic management, each “task” might correspond to managing an individual intersection within a specific urban area. These intersections, characterised by distinct speed limits, collectively form a network encompassing the entire city.
«By focusing on a smaller subset of intersections that significantly contribute to the algorithm’s overall efficiency, this method maximises the decision-making agent’s performance,» explain the researchers.
What led the MIT team to develop this approach?
AI and complex decision-making: focus on Model-Based Transfer Learning
To train an RL system to control all the traffic lights at intersections within a specific urban area – and subsequently arrive at a balanced and safe decision – the research team examined two approaches: training an algorithm for each task independently, using only the data from that intersection; or training a larger algorithm using the data from all tasks and then applying it to each one.
Both approaches present challenges. Training a separate algorithm for each task involves a process that is time-, data-, and computation-intensive, while training a more robust algorithm for all tasks often results in below-average performance.
At this point, the team decided to select a subset of tasks and train an algorithm for each one independently. They added, «It is important to highlight that we strategically selected individual tasks with the greatest likelihood of improving the overall performance of the algorithm across all tasks».
As mentioned, to identify which tasks to select to maximise the performance of all the others, the authors developed an algorithm called Model-Based Transfer Learning (MBTL), which consists of two parts: one models the performance of each algorithm as if it were trained independently on a specific task; the other models «how much each algorithm’s performance would deteriorate if transferred to each other task, a concept known as “generalization performance“».
It is this explicit modelling of generalization performance that allows the MBTL algorithm to estimate the value of training on a new task. «It performs this operation sequentially, first selecting the task that brings the greatest improvement in performance, then selecting additional tasks that provide the next marginal improvements».
Glimpses of Futures
If it is true that AI systems today increasingly act as decision-makers across a wide range of activities, there is still no scalable and generalisable model capable of shaping the performance of RL in decision-making processes activated in dynamic contexts. The example of urban traffic management is emblematic in this regard.
The work conducted by MIT moves in this direction, paving the way for further research into reliable methodologies based on contextual Reinforcement Learning models.
To anticipate possible future scenarios, we will now attempt to outline, using the STEPS matrix, the potential impact of the described framework’s evolution across various fronts.
S – SOCIAL: in the future, driven by advancements in Reinforcement Learning techniques applied to decision-making processes in dynamic contexts, Model-Based Transfer Learning algorithms could be extended to applications requiring AI agents to make decisions about more complex and larger-scale problems. Examples include high-variability activity spaces involving flows of people and vehicles, anticipated to define next-generation mobility systems and, more broadly, the smart cities of the future. Here, intelligent traffic management, air quality monitoring, and smart lighting control – enabled by increasingly autonomous AI systems making safe and reliable decisions – could converge to achieve more efficient reductions in emissions and energy consumption.
T – TECHNOLOGICAL: when the authors tested the new Reinforcement Learning technique for decision-making in dynamic contexts – simulating decisions related to traffic signal control and real-time speed advisory management – it proved to be five to fifty times more efficient than traditional RL techniques. This means that in the future, as the method evolves, it could achieve the same results by training models with significantly fewer data. For instance, the team explains, “with a fiftyfold increase in efficiency, the Model-Based Transfer Learning algorithm could be trained using data from just two tasks and achieve the same performance as a standard method requiring data from one hundred tasks.” This reduction in computational resources is expected to streamline the procedures and timelines involved in training AI models – and not just in this area (see the final point).
E – ECONOMIC: on the economic impacts of the AI decision-making sector, the perspective of economist Daron Kamer Acemoğlu (professor of economics at MIT, Nobel Prize winner in Economics in 2024, and among the ten most cited economists globally) offers a refreshingly cautious and realistic take, contrasting with overly enthusiastic narratives. In his article “Don’t Believe the Artificial Intelligence Hype” – published in the Financial Review in May 2024 – Acemoğlu estimates the macroeconomic effects of AI to be no more than a 0.71% increase in total productivity over ten years. Moreover, he argues that even this estimate might be “overstated,” as it reflects performance on tasks that are relatively simple for machines to learn, “whereas some future effects will come from more challenging tasks characterised by multiple context-dependent factors that influence decision-making processes.”
P – POLITICAL: who is accountable for incorrect decisions made by artificial intelligence systems? Who bears responsibility for an accident caused by an autonomous vehicle or a poor investment suggested by an AI agent? These are questions that have been raised globally for several years. If the framework proposed by the MIT researchers evolves to the point where machines become increasingly adept at decision-making across all types of contexts, including the most complex and variable, these questions will become even more pressing. On this topic, it is worth recalling the European Parliament’s proposal for the Artificial Intelligence Liability Directive (AILD) of 28 September 2022. Complementing the EU AI Act, this directive updates the EU’s civil liability framework for AI agents by introducing, for the first time, specific rules regarding damages caused by AI technologies. Under the directive, individuals harmed by AI systems are entitled to compensation from the system’s owner as if the damage had occurred under any other circumstances.
S – SUSTAINABILITY: training contextual Reinforcement Learning models powered by Model-Based Transfer Learning algorithms to enable machines to make decisions in complex and dynamic environments requires vast amounts of data and computation. This involves hours of training, inevitably impacting energy consumption and carbon footprints. This is the flipside of the development of increasingly advanced and efficient AI techniques in decision-making. For instance, a study by Energy analysts on the future electricity load from the so-called “social robotification” (“Direct and Indirect Impacts of Robots on Future Electricity Load”) estimates that, in the United States, machine energy consumption could rise to 0.5–0.8% of the country’s total electricity demand by 2025.