An article from MIT and Boston University offers thought-provoking insights for a fresh debate on equity in patient care, focusing not only on AI-driven decision-support systems but also on other tools used in medical practice.
Artificial intelligence systems that make “unethical” decisions – such as those discriminating based on ethnicity, socio-economic status, age group, or geographic origin – operate according to a framework tainted by bias. This bias stems from training datasets that themselves contain prejudices and distortions, whether intentional or not.
In healthcare, these biases infiltrate AI algorithms used in clinical decision-making, often perpetuating disparities in the treatment and care provided to patients. For this reason, they are among the most dangerous.
The issue of AI adhering to fundamental human rights and ethical principles – while being reliable and safe – is not new. Over the past decade, this topic has been central to a global debate, particularly in the Western world. This conversation has gradually broadened to include engineers, AI developers, technicians, governments, institutions, legal experts, and philosophers. As a result, recommendations, requirements, guidelines, and principles for risk-free AI have been proposed.
In the European Union, this extensive process culminated in March 2024 with the definitive approval of the EU AI Act or “Artificial Intelligence Law.” This landmark legislation establishes the world’s first legal framework for artificial intelligence. Among its provisions, it classifies AI systems used in healthcare as “high-risk” and mandates stringent compliance assessments to ensure their accuracy, robustness, and safety.
In contrast, countries such as the United States, where no comprehensive federal legislation on artificial intelligence yet exists, present a different scenario. The healthcare sector in the US operates on foundations distinct from those in Europe, adding further complexity to this evolving landscape.
TAKEAWAYS
Algorithmic discrimination in medical diagnostics: the case of the United States
References to the United States are frequent in the literature on algorithmic discrimination in medical diagnostics, not only because of the particular nature of its healthcare system – largely based on private insurance, which is often prohibitively expensive – but also because U.S. research has produced the largest body of work on this topic over the years.
For instance, a study from Stanford University, published in September 2020 and outlined in “Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms”, reported that the datasets used to train AI algorithms in most American diagnostic tools were drawn from just three states: California, Massachusetts, and New York.
Thirty-four states were not represented at all, and the remaining 13 contributed only limited data. Even four years ago, during the height of the pandemic, this raised serious concerns about the ability of such tools to perform diagnostic assessments equitably across the diverse U.S. population. Patients differ in ethnicity, encounter varied climates and natural environments, and face diverse exposure risks to pollutants, just two of the critical variables for medical research.
AI algorithms automating bias embedded in training data
One and a half years later, an article titled “In medicine, how do we machine learn anything” (published in Patterns on 14 January 2022) by a researcher from the MIT Institute for Medical Engineering and Science and a colleague from Boston University, examined for the first time how AI algorithms automate the hidden biases present in the healthcare datasets used to train them. They also addressed the lack of awareness among healthcare professionals – whether clinicians, AI experts, or bioengineers – regarding the detection and management of these biases.
The authors specifically referred to «pervasive bias in clinical devices». Their focus was on clinical data, such as vital signs, blood values, symptoms, and diagnostic imaging, which are inherently variable. Adding to this complexity is the fact that these data are often collected using medical devices that are not equitably designed to accommodate characteristics like ethnicity, gender, age, or other patient traits. Crucial factors, such as skin colour, geographic origin, or sexual identity, are frequently ignored.
A notable example is pulse oximeters, which measure blood oxygen levels by emitting beams of light through tissue. It was found that darker skin tones influence light absorption, potentially leading to overestimated oxygen saturation levels in patients with darker skin.
This issue was also raised in 2021 by the UK’s National Health Service (NHS) and the Medicines and Healthcare products Regulatory Agency (MHRA) during the second wave of COVID-19. Both institutions highlighted that, for patients with darker skin, pulse oximeters often indicated higher oxygenation levels than were actually present. This discrepancy could delay the identification of early severe respiratory symptoms in patients experiencing lung failure due to the virus.
The recurring question is: how reliable and objective are the clinical data collected through devices such as these, which are subsequently used to train AI algorithms for diagnostic purposes?
AI detects patients’ racial identity from diagnostic images, beyond physicians’ control
In discussions about algorithmic discrimination in medical diagnostics, another branch of U.S.-based studies has focused on the ability of artificial intelligence systems to detect certain demographic factors – such as racial identity – from video-based patient data obtained via X-rays, CT scans, and MRIs.
A study published in May 2022, titled “AI recognition of patient race in medical imaging: a modelling study”, conducted by researchers at the Massachusetts Institute of Technology (MIT) alongside the Institute for Medical Engineering and Science (IMES) and the Abdul Latif Jameel Clinic for Machine Learning in Health (both based at MIT), revealed that these AI capabilities operate independently of any explicit racial markers in the images. Moreover, they are entirely beyond the control of medical professionals.
In addition to raising privacy concerns, the study highlights potential risks for patients, particularly in a socio-economic context like the United States, where access to healthcare is deeply tied to these factors.
The researchers caution: «If, in diagnostic imaging, an artificial intelligence system that identifies patients’ racial identities and makes therapeutic decisions commits errors based on discriminatory healthcare practices tied to these identities, radiologists and imaging specialists – who typically do not have access to race and ethnicity information – would be unable to recognise and correct these biases. This poses a significant risk of inadvertently embedding such biases into broader healthcare decision-making processes».
This casts a shadow over racial discrimination potentially being perpetuated in access to medical care.
Algorithmic discrimination in medical practice: focus on decision support tools
Fast-forwarding to the present, researchers from MIT and Boston University have once again taken centre stage. In their article, “Settling the Score on Algorithmic Discrimination in Health Care” (New England Journal of Medicine AI, 2024), they highlight the shortcomings of U.S. regulations on AI models used in healthcare, as well as on other computational algorithms, calling for greater oversight by regulatory bodies.
This momentum coincided with the issuance of a new rule in May 2024 by the U.S. Department of Health and Human Services (HHS) Office for Civil Rights (OCR), under the Affordable Care Act (ACA), the landmark healthcare reform passed in 2010 during President Obama’s administration.
The new regulation expressly prohibits algorithmic discrimination in medical practice based on a patient’s age, gender, race, country of origin, or any disabilities. Specifically, it addresses “decision-support tools” used by healthcare providers in patient care. This term encompasses both AI-driven systems and non-automated tools commonly employed in medicine.
While the researchers commend the regulation as a significant step forward towards greater healthcare equityin the United States, they also argue that «It should enforce equity-driven improvements not only in AI-based algorithms but also in all non-AI systems used in clinical decision support across sub-specialties».
Here, “sub-specialties” refer to specific areas of expertise within a medical field, such as electrophysiology, which focuses on electrical disturbances in the heart for cardiologists specialising in arrhythmology. Let us delve deeper into the implications.
Software as a Medical Device (SaMD) and clinical risk calculation tools
The number of AI devices approved by the U.S. Food and Drug Administration (FDA) has risen significantly over the past decade. As of October 2024 alone, nearly 1,000 devices had been approved, many designed to support medical decision-making. However, the researchers stress that the tools covered under the new regulation against algorithmic discrimination in healthcare should also include those used to calculate clinical risk scores, as well as existing administrative applications. Many of these tools were specifically developed for clinical contexts but lack the preliminary validation required to confirm their reliability.
«Often, these tools, adopted in a critical domain like healthcare without thorough validation, perpetuate disparities among patients of different ethnicities or from specific geographic areas. This creates serious risks in terms of access to care and therapeutic treatments for non-American populations» the researchers note.
The FDA currently classifies clinical algorithms as Software as a Medical Device (SaMD) but excludes from this category software used by physicians that can be verified by medical professionals. This loophole allows certain tools to evade regulatory oversight.
«We argue that to ensure equity and safety in patient care, healthcare organisations must guarantee that all tools, including those related to clinical risk scoring, undergo comprehensive evaluation before being deployed» the authors emphasise.
Clinical risk refers to the likelihood that a patient will experience «unintentional harm attributable to healthcare, resulting in prolonged hospital stays, worsened health conditions, or, in extreme cases, death».
Physicians rely on tools and equipment to quantify this risk through a numerical score. These scores are often considered less “opaque” than AI algorithms, as they typically involve only a few variables linked to a straightforward model.
Despite their widespread use, the researchers underline that «there is no regulatory body overseeing the clinical risk scores produced by decision-support tools, even though most U.S. physicians (65%) use them on a monthly basis to determine the next steps in patient care».
This lack of oversight leaves a critical gap in ensuring equitable and accurate healthcare practices.
Glimpses of Futures
Regardless of the country in question – whether the USA or Europe – and the respective healthcare policies, the article by researchers from MIT and Boston University offers compelling insights into equity in patient care. It draws attention to all decision-support tools used by medical professionals, highlighting that these tools are fundamentally dependent on patient data. Such data may conceal patterns distorted by bias and discrimination, potentially leading to disparities in therapeutic treatments.
Using the STEPS matrix, let us explore possible future scenarios by analysing the social, technological, economic, political, and sustainability impacts of oversight for tools that utilise artificial intelligence techniques in healthcare, as well as those based on simpler algorithms for calculating clinical risk scores.
S – SOCIAL: in the social context under consideration – where there is no equivalent to the EU AI Act and healthcare is predominantly private – a future regulation addressing algorithmic discrimination at all levels (presuming it overcomes obstacles such as potential political resistance during a Trump-like presidency) could eventually ensure equity in diagnosis, care, and clinical risk scoring for all patients, regardless of their race, origin, socio-economic status, gender, or age. More specifically, the regulation of clinical risk scores presents a major challenge, as the authors point out. This is particularly crucial due to the proliferation of decision-support tools embedded in U.S. electronic health records (EHRs). Ensuring transparency and the absence of discrimination in these tools becomes an essential priority.
T – TECHNOLOGICAL: from a technological perspective, overseeing tools that calculate clinical risk scores must focus on the healthcare data used to model the algorithms. In these cases, the primary issue is less about the quality of the data – which may be marred by biases and distortions – and more about its scarcity. Limited datasets fail to represent the entirety of the American population. In a future scenario, “supervisors” will need to work on expanding healthcare datasets used to train algorithms. This expansion must consider the broad spectrum of patient demographics, including age ranges, countries of origin, ethnicities, and socio-economic statuses, to ensure algorithms are fair and effective across the diverse U.S. population.
E – ECONOMIC: as highlighted by the authors of “In medicine, how do we machine learn anything”, the issue of AI algorithms automating biases hidden within training datasets is compounded by the lack of preparedness among healthcare professionals, AI specialists, and bioengineers. These individuals often struggle to identify such biases, increasing the risk that discriminatory practices may inadvertently be incorporated into healthcare decision-making processes. Looking ahead, this suggests the need to develop specialised professional roles equipped with the skills required to supervise all decision-support tools used by medical personnel in patient diagnosis and care. The aim would be to verify these tools’ suitability to guide equitable decisions, ensuring that all patients have access to the most appropriate therapeutic treatments without any form of discrimination.
P – POLITICAL: focusing on the United States, an additional consideration involves the incoming administration under President Donald Trump, elected on 6 November 2024. As is well known, Trump has previously emphasised deregulation in healthcare and expressed opposition to the Affordable Care Act introduced by Obama, as well as to broader non-discrimination policies championed by the former president. This political stance suggests that regulating tools for calculating clinical risk scores may face significant challenges in the coming years, as such efforts would likely conflict with the administration’s preference for minimal oversight and deregulation.
S – SUSTAINABILITY: achieving future global and comprehensive equity in patient care – eliminating the risk of algorithmic discrimination in medical practice – would require rigorous preliminary evaluation and ongoing oversight of all decision-support tools used by healthcare professionals. This includes tools designed to calculate clinical risk scores. Such measures would contribute to the social sustainability of any healthcare system by addressing and removing the underlying factors that perpetuate inequalities among patients. A fair and equitable system ensures that disparities in treatment outcomes are minimised, fostering trust and inclusivity within healthcare frameworks worldwide.