A recent Swedish study demonstrates the potential of artificial intelligence techniques like deep learning and transformer neural networks in revolutionising the computational prediction of chemical toxicity in water.
Pollution impacting rivers, lakes, seas, and groundwater aquifers represents a particularly intricate phenomenon, driven by interconnected causes and exerting diverse effects on ecosystems and the global biodiversity. Some years ago, the italian environmental NGO Legambiente brought to light in its report “H₂O – The Chemistry Polluting Water“, a long-ignored issue concerning the chemical status of water bodies, which «have been utilized as dumping sites to dispose of industrial effluents, legally or otherwise, for many decades».
Moreover, the European Commission’s Joint Research Centre, in its 2020 documentation on the microbiological parameters of drinking water, identified chemical pollution of water as “one of the foremost global environmental challenges,” profoundly impacting flora and fauna, human health, and the toxicity levels of soils contaminated by tainted inland waters.
TAKEAWAYS
Chemical contaminants in water bodies
Over the past four years, following the cited studies, the situation remains largely unchanged. Little has been done to tackle the core issue: «80% of global wastewater from all sectors is released into the environment untreated». Up to now, there has been insufficient regulation concerning the disposal of industrial and pharmaceutical chemicals, which is the primary cause of water degradation. There are signs of progress over the last two years, particularly since the onset of 2024, but considerable time has been lost [source: “Water Quality and Wastewater” – United Nations].
Specifically, within industrial sectors, the most hazardous chemical agents released into rivers and lakes include heavy metals (notably mercury, nickel, and lead), as well as nitrogen and phosphorus. When present in high quantities, these can pose carcinogenic risks.
Furthermore, alongside these naturally occurring substances (typically found at low concentrations), synthetic chemicals have been developed so rapidly that it has become challenging for regulatory bodies to determine which should be restricted and to what extent due to their toxicity [source: “What Are the Dangers of Chemical Pollution in European Waters?”, European Environment Agency].
Amongst laboratory-produced chemicals, a notable group is the PFAS—PerFluorinated Alkylated Substances. This group includes compounds such as perfluorooctanoic acid (PFOA), perfluorooctane sulfonic acid (PFOS), perfluorohexane sulfonic acid (PFHxS), and perfluorononanoic acid (PFNA). These have been used since the 1940s in the production of various items like non-stick pans, stain-resistant fabrics, firefighting foams, and products resistant to oil, water, and grease. Known as “forever chemicals” due to their resilient carbon-fluorine bonds – one of the strongest in chemistry – PFAS do not easily break down in the environment [source: National Institute of Environmental Health Sciences].
Chemical pollution of water: PFAS concern for drinking water
Globally, PerFluorinated Alkylated Substances have been detected in both groundwater – posing a direct threat to the food chain by potentially contaminating plants, agricultural outputs, and livestock – and in drinking water.
In Italy, regions such as Veneto, Lombardy, and Tuscany have shown PFAS contamination in water intended for human consumption. Notably, a Greenpeace Italy survey across twelve Lombard provinces from May 12 to 18, 2023, found eleven out of thirty-one drinking water samples (tested by an independent laboratory) contaminated with these “forever chemicals”.
Furthermore, in October 2023, Greenpeace Italy, in a move described as “transparency,” disclosed the results of water supply monitoring in Milan – previously unpublished – carried out by MM Spa and the Metropolitan City of Milan’s Health Protection Agency from 2021 to early 2023.
The report indicated that «out of 462 drinking water samples tested, PFAS were found in 132 samples, representing one in four». Although the detected chemical levels were below the italian Ministry of Health’s 2014 thresholds (300 nanograms per liter for PFOS, 500 for PFOA, and 500 for other PFAS), «in more than one in four cases, the concentrations exceeded the more stringent health precautionary standards set by US regulations».
Chemical pollution of drinking water: regulations against PFAS
In 2019, Denmark set a precedent as the first European country to decisively tackle the issue of chemical pollution in drinking water by prohibiting the use of PFAS chemicals in the production of food packaging and non-stick coatings.
Across the rest of Europe, there remains no specific regulation targeting PFAS chemicals to this date. Rather, the broader EU directive on water quality, last revised in 2020, is enforced. This directive will impose a maximum threshold of 100 nanograms per litre for each of the 24 distinct PFAS compounds identified in drinking water starting January 12, 2026.
The regulatory landscape continues to evolve; on April 24, 2024, the EU Parliament reached a provisional agreement with the European Council to introduce new regulations on packaging materials, particularly plastics. These aim to achieve a reduction of 5% by 2030, 10% by 2035, and 15% by 2040, and include restrictions on PFAS use in food packaging beyond certain limits.
Meanwhile, in the United States, the gradual phase-out of PFOA and PFOS in production and general use began in January 2023. In a notable development on April 10, 2024, the Environmental Protection Agency dramatically lowered the permissible PFAS levels in drinking water from 70 to 4 parts per trillion.
Technologies supporting the prediction of chemical toxicity levels
What criteria determine whether the concentrations of chemical agents in water are “safe” or “risky” for the residing organisms, humans, and the broader environment? Conventionally, the approach is experimental, involving the exposure of specific aquatic species – like algae or certain fish – to various chemicals under controlled laboratory conditions. This method allows scientists to pinpoint the chemical concentration at which specific adverse effects manifest in these organisms, subsequently assigning a corresponding “safety factor” from low to high [source: “Predicting exposure concentrations of chemicals with a wide range of volatility and hydrophobicity in different multi-well plate set-ups” – Scientific Reports].
In search of a quicker and more cost-effective alternative, computational methods have gained prominence over the past decade as predictive tools for chemical pollution of water. Among these is the “Quantitative Structure-Activity Relationship” (QSAR), which uses computational power to «predict the potential biological effects of a chemical compound based on extensive data about its structure and interactions with its aquatic receptor» [source: “Quantitative Structure-Activity Relationship” – ScienceDirect].
Additionally, predictive models that employ machine learning techniques have proved effective, enabling the correlation of structural variations in chemical agents with the resultant toxicological changes in organisms.
However, when dealing with chemicals from multiple classes – for example, perfluorooctane sulfonic acid (PFOS), perfluorohexane sulfonic acid (PFHxS), and perfluorononanoic acid (PFNA) – multiple QSAR models are often required. This necessity is highlighted by researchers from the Department of Mathematical Sciences at Chalmers University of Technology and the Department of Biology and Environmental Sciences at the University of Gothenburg in Sweden. Their study, published in “Transformers enable accurate prediction of acute and chronic chemical toxicity in aquatic organisms” (Science Advances, March 2024), also critiques the current limitations of machine learning approaches:
«… while it has been applied to predict various biological activities, including the toxicity of water on certain organisms, the existing methods still lack the required precision and breadth of applicability crucial for regulatory applications».
The computational approach based on transformer neural networks
In addressing the issue about chemical pollution of water and the expertise required to measure the toxicity levels of certain substances to organisms, the Swedish team contends that existing computational methods, such as QSAR and machine learning, can only substitute a minor fraction of the data derived from conventional experimental procedures.
There is an urgent need for novel approaches, especially to aid authoritative bodies in analyzing the vast array of newly synthesized chemical agents, aiming for their quicker and more precise classification to regulate their use more effectively.
The researchers have pinpointed transformer networks as a potentially pivotal element for computational predictions in toxicology. It is noteworthy that “transformers” denote a specific type of neural network with a relatively straightforward architecture, initially developed for processing natural language. Recently, however, the authors point out that the deep learning techniques it relies upon have also shown proficiency in extracting information from biological and chemical structures.
«The transformers employ self-attention, a mechanism that discerns complex dependencies directly from the data, highlighting portions of the chemical structure that are particularly rich in information. This enables the identification of the most critical structural features of the compound, crucial for accurately predicting its chemical toxicity»
the research group elaborates. Specifically, the developed transformer neural network was trained using a substantial dataset from laboratory tests (147,864 measurements), both standardized and conducted by the international scientific community, on 6,473 chemical agent structures and their toxicological impacts on aquatic organisms across the three regulatory groups: algae, invertebrates, and fish.
Comparative tests with QSAR predictive models
A principal distinction between the proposed transformer model and traditional QSAR methods is the more comprehensive numerical representation of chemical structures by the former. During the testing phase, the transformer neural network for predicting chemical toxicity values in algae, invertebrates, and fish exhibited significantly expanded applicability domains.
This enhancement allows for predictions across a wider and more varied array of chemical substances, even those not present in the training data, unlike the limitations faced with tools based on the Quantitative Structure-Activity Relationship.
For example, the numerical representation of chemical structures in the QSAR model known as Ecosar – utilized by the team for comparative tests – encompasses 263 applicability domains, covering 111 chemical classes specifically for the predictive analysis of chemical toxicity values in fish, at the expense of the other two groups of aquatic organisms (algae and invertebrates).
![Grafico che mostra la percentuale di strutture chimiche per le quali la rete neurale transformer e i modelli QSAR proposti (Ecosar, Vega e T.E.S.T.) sono in grado di fare previsioni relative alla loro tossicità su alghe, invertebrati acquatici e pesci, riportata sia per le sostanze chimiche che rientrano nel normale dominio di applicabilità (inside AD), sia per quelle sostanze in merito alle quali è possibile fare una previsione (extended AD). [(A) e (B)] pesce EC50 (n = 3542) e pesce EC10 (n = 2321); [(C) e (D)] invertebrati acquatici EC50 (n = 3741) e invertebrati acquatici EC10 (n = 2647); [(E) e (F)] alghe EC50 (n = 2843) e alghe EC10 (n = 2756) [fonte: “Transformers enable accurate prediction of acute and chronic chemical toxicity in aquatic organisms” - Science Advances - https://www.science.org/doi/10.1126/sciadv.adk6669].](https://tech4future.info/wp-content/uploads/2024/05/inquinamento-chimico-delle-acque-schema-comparativo-1024x1024.png)
The research group comments:
«The transformer model demonstrated superior predictive performance for all tested organism groups – algae, invertebrates, and fish – and offers a wider scope of applicability and a notably lower error rate compared to conventional QSAR methods. » Additionally, the graph illustrates that when the transformer network is trained on data involving multiple effect concentrations (EC50/EC10), its performance is further enhanced. As for the errors, they range «between a factor of 2.00 and 3.50 relative to the experimentally measured effect concentrations»
Glimpses of Futures
The issue of global water pollution has escalated to such an extent that it now encompasses even underground and potable water sources, which are increasingly contaminated by hazardous perfluoroalkyl and polyfluoroalkyl substances (PFAS). Alarmingly, this problem has become “close” to both us and the food chain on which we depend. Given its vast scope and the myriad stakeholders involved, managing it effectively is nearly impossible without rigorous policies for wastewater control and management, stringent regulations, and robust penalties for non-compliance.
In this context, technology can play a pivotal role by swiftly assessing the toxicity of hundreds and thousands of chemicals that, despite lacking a toxicological profile, are widely used across industries globally.
With the goal of anticipating potential future scenarios, let us now use the STEPS matrix to examine the impacts that the evolution of transformer predictive models for forecasting water toxicity values might have on various fronts.
S – SOCIAL: the authors of the referenced study point out that over 100,000 chemicals are currently used industrially on the global market, yet only a small fraction has a documented toxicity profile that describes their level of toxicity to humans and the ecosystem. The majority are awaiting analysis and evaluation. The advancement of a computational method using artificial intelligence techniques, such as the one described, aimed at predicting chemical toxicity in water, could potentially eliminate this waiting period in the future. This would not only expedite the entire analysis process but also enhance its accuracy through reliance on objective data and mathematical calculations rather than solely on observations from laboratory experiments, including animal testing. Furthermore, by screening vast quantities of data concerning the technical specifications of hundreds of new chemical agents and subsequently cross-referencing this data, AI techniques would also aid in developing more sustainable chemical substances and identifying less harmful alternatives to replace more toxic ones.
T – TECHNOLOGICAL: the transformer neural network’s performance in comparative tests conducted by the team, particularly those predictive tests of chemical toxicity values that extended beyond the typical domain of applicability (extended AD), indicates that the training data were sufficiently rich and varied. Nonetheless, as the predictive model described evolves, the challenge of data will intensify, necessitating not only large volumes of data but also data of high quality. In the coming years, substantial efforts will be required to gather toxicological analysis data from various independent sources, aiming to aggregate diverse and substantial datasets. A further challenge remains the longstanding absence of organized data on chemical toxicity, which explains why the data sets used to train the transformer were hybrid, consisting of both standardized laboratory test data and experiments conducted by the scientific community. In a future scenario, therefore, considerable data science efforts will be essential to ensure that these data and metadata are comprehensive, consistent, and accessible.
E – ECONOMIC: in the future, adopting AI methods based on transformer neural networks for predicting water toxicity—which are more cost-effective than traditional methods—could significantly reduce laboratory management expenses for direct experiments on aquatic organisms exposed to specific chemicals. Moreover, by hastening the toxicological evaluation of thousands of chemical agents used by industries and discharged into the sea without prior testing, this approach would help decrease the chemical burden in waters. Indirectly, it would also reduce the costs associated with measures to mitigate chemical pollution in rivers, lakes, seas, and groundwater. These economic savings should increasingly be reinvested in utilizing wastewater as a resource, as underscored by “The United Nations world water development report, 2017: Wastewater: an untapped resource; executive summary“, edited by UNESCO, which notes that «safely managed wastewater represents a sustainable opportunity, providing a source of water, energy, and other recoverable materials, whilst also fostering new business opportunities and creating green jobs».
P – POLITICAL: the possibility of reducing the chemical load in waters through increasingly sophisticated predictive models for chemical toxicity assessment in aquatic organisms might, in the future, be integrated into a broader industrial wastewater control policy. This could act as a catalyst for comprehensive agreements and regulations on wastewater management. Considering data from the UN Department of Economic and Social Affairs predicting a rise in urban population density by 2050, with 68% of the global population residing in urban areas, there is an urgent need to develop «adequate infrastructure to manage wastewater efficiently and sustainably». As of February 20, 2024, the White House announced an allocation of nearly $6 billion for infrastructure supporting clean drinking water and wastewater treatment. This was followed by the European Union, which in April 2024, ratified an agreement reached in January 2024 aimed at “zero pollution for Europe”,«setting the stage for higher standards in the treatment and monitoring of urban wastewater to avoid releasing harmful substances like microplastics and PFAS compounds into the environment».
S – SUSTAINABILITY: the advancement of transformer predictive models for estimating water chemical toxicity levels could significantly benefit environmental sustainability by reducing the global chemical load in waters. It could also enhance social sustainability, particularly in economically and socially vulnerable countries in Africa, Asia, and Latin America. According to data from the United Nations Environment Programme (UNEP) concerning global water quality since the 1990s, water pollution has deteriorated across nearly all rivers, severely impacting the availability of drinking water. The annual United Nations World Water Development Report (WWDR) highlights that the most significant increases in pollutant exposure have occurred in low and lower-middle income countries, mainly due to population growth (especially in Africa) and inadequate wastewater management policies.