Data-driven research in materials science is revolutionising the sector, accelerating the discovery of new materials and optimising development processes.
Innovation is a constant in materials science, as is the research guiding the discovery of new compositescrucial for technological and industrial progress. These advancements aim to enhance human well-being, with significant economic and environmental benefits.
Innovative materials such as smart, biodegradable, advanced composites, and nanomaterials are drawing increasing attention from both research centres and major industrial groups, targeting sectors like energy, medicine, optics, electronics, and the Internet of Things (IoT).
But what impact can these materials have on the economy and society? How can data science and artificial intelligence accelerate and optimise the discovery and development of these innovative materials? Moreover, how can we foster greater collaboration among academia, policy makers, and industry to facilitate the transition from research to large-scale production of innovative materials within the European Union?
These were key topics discussed over the two days of the TEHA Technology Forum 2024, the main event hosted by The European House – Ambrosetti, dedicated to the latest technological innovations. Held recently, the forum examined the current state of research and its future prospects, featuring a noteworthy analysis by Gianaurelio Cuniberti from the Technische Universität Dresden, focusing on the role of AI in the search for new materials. This role is so pivotal that it has given rise to the term “data-driven materials“.
TAKEAWAYS
How the research paradigm is changing
We are witnessing a significant shift in the research paradigm. Traditionally, it could take up to 20 years from the discovery of a material to its practical and industrial-scale implementation.
«Just think of materials like Teflon or Velcro – explains Cuniberti – which were discovered almost by chance and then subjected to simulations, tests, and manipulations until the final evaluation of the material itself».
This approach relied, or still relies, heavily on serendipity, the fortunate discovery of one thing while searching for something else.
«Today, however, we are seeing a much shorter path from discovery to implementation. New paradigms that accelerate the discovery phases are emerging, leading to what is referred to as “AI-assisted discovery”. Machine Learning technologies are being utilised in all stages of simulation, making the entire process much faster and more efficient».
Data-driven materials: AI and HPC to accelerate research
A concrete example of this approach is a study conducted by a group of researchers from the Physical and Computational Sciences Directorate at the Pacific Northwest National Laboratory in Richland, aiming to develop better solid-state batteries. The goal was to identify solid electrolytes suitable for this application.
«This is a very broad field of research, considering that we started with over 32 million materials, precisely 32,598,079. In a process that took less than 80 hours in total, the number of materials was reduced to 18, which were then subjected to tests and experiments. This demonstrates how the search for new materials through high-speed calculations can significantly speed up the design and discovery process».
In this instance, the researchers combined advanced artificial intelligence (AI) models with traditional physics-based models, utilising high-performance computing resources in the cloud.
By employing around a thousand virtual machines in the cloud, the process was completed, as mentioned, in less than 80 hours. The researchers synthesised and experimentally characterised the main candidates, demonstrating the potential of these compounds as solid electrolytes.
Further candidate materials are currently under experimental investigation, providing additional examples of computational discovery of new solid electrolyte phases.
This particular approach, which integrates AI models and HPC in the cloud, not only accelerates the discovery of new materials or materials suitable for specific applications but also demonstrates the effectiveness of AI-driven experimentation in achieving scientific discoveries with practical applications.
It is important to note, however, that this approach is not without its limitations, particularly concerning the availability of large-scale computational resources.
Organa: Generative AI creates a collaborative laboratory
A noteworthy example, particularly relevant to the field of chemistry, comes from the University of Torontowith their project Organa. This initiative exemplifies the practical application of what generative AI promises in theory: not merely serving as an assistant but acting as a true colleague. The system developed by the university’s researchers employs generative AI to establish a fully automated chemistry laboratory.
Chemical experimentation often demands extensive resources and labour. Despite the advantages brought by advanced laboratory equipment in terms of time and effort, many tasks still require manual intervention by chemists. Traditional automation infrastructures struggle to flexibly adapt to the myriad of experiments conducted in labs.
Aiming to overcome this significant bottleneck, Organa was created. It is a flexible and user-friendly robotic system that automates a wide range of chemical experiments.
Organa interacts with lab and research group chemists using natural language, thanks to the implementation of Large Language Models (LLMs). It continuously updates researchers on the work done, providing timely reports with statistical analyses and engaging with users for clarifications or problem resolution.
Organa is adept at interpreting user inputs to define experimental objectives and plan extensive sequences of robotic tasks and actions, also utilising visual feedback from its environment. It supports the programming and parallel execution of multiple experiments, managing resources, and coordinating several robots and experimental stations.
The system has proven its effectiveness in various chemical experiments, including solubility assessment, pH measurement, recrystallisation, and electrochemistry experiments. Notably, in electrochemistry experiments, Organa executed a detailed plan comprising 19 parallel phases to characterise the electrochemical properties of quinone derivatives used in rechargeable flow batteries.
From the users’ perspective, a specific study on workgroups highlighted how Organa significantly enhances user experience by reducing their physical workload, demonstrating how the integration of AI and advanced automation can transform work in chemical laboratories.
AlphaFold 3: the game changer in data-driven materials research
A third example, still linked to chemistry, extends into the realms of biology and pharmaceutical research.
In this case, the study focused on peptide crystallisation. Peptides are short proteins, composed of strings of amino acids that fold onto themselves. Traditionally, tracing back along the chain of these structures required substantial investments of time and money.
However, with the arrival of AlphaFold 3 in May this year, the situation has changed dramatically. Before the advent of artificial intelligence, protein structure prediction relied on experimental methods such as X-ray crystallography, NMR spectroscopy, and complex computational methods like homology modelling. These methodologies were certainly costly and burdensome, posing significant obstacles in the drug discovery and development processes. For years, scientists have sought to integrate advanced AI models to accelerate and improve the accuracy of these processes.
With AlphaFold, an AI tool developed by DeepMind, a subsidiary of Google, things could change. The first version of the technology was released in 2018, but it was AlphaFold 2, in 2020, that made headlines by winning CASP 14.
CASP (Critical Assessment of protein Structure Prediction) is a global experiment in protein structure prediction that has been conducted every two years since 1994. It allows over 100 research groups to objectively test their structure prediction methods through an independent assessment of the state-of-the-art in protein structure modelling.
AlphaFold 2 effectively outperformed other methods in predicting the 3D structures of proteins from their amino acid sequences, using a deep learning architecture called Evoformer. This technology could revolutionise research into protein structures and folding mechanisms, leading to significant advances in drug discovery and vaccine development.
AlphaFold 3, the latest version, represents a further improvement with an updated Evoformer model, simpler and more focused on pair representation. It also utilises a “diffusion network” similar to those employed in AI image generation tools, significantly increasing prediction accuracy. Beyond predicting protein structures, AlphaFold 3 can now model interactions between proteins and other biological molecules such as DNA, RNA, and ligands, greatly expanding its application scope. This capability is crucial for understanding fundamental biological processes and identifying potential drug candidates.
The importance of open access to the programs on which new discoveries are based
AlphaFold 3, while already utilised commercially through collaboration with Isomorphic Labs, offers a free server for non-commercial purposes, making this technology accessible to researchers worldwide.
For the sake of completeness, it should be noted that while the presentation of AlphaFold 3 has excited the scientific community for its potential to significantly improve protein structure predictions and facilitate new drug discoveries, both DeepMind and the journal Nature have faced criticism for the limited access to the programme and for not releasing the underlying computational code.
In an open letter signed by over 650 researchers, disappointment was expressed over the lack of resources accompanying the publication, accusing the journal of not adhering to its own rules on code availability.
To understand the practical applications of this research, Cuniberti humorously suggests thinking of an acronym: GLP. «While the most notable acronym in the past three years has been Chat GPT, in the world of chemistry and biology, attention is focused on GLP, which stands for Glucagon-Like Peptide. GLP refers to a peptide similar to glucagon, a hormone that opposes insulin. Glucagon acts by converting fat from adipose cells into sugar, thereby reducing the feeling of hunger. This molecule is becoming the focal point of new research methodologies in the pharmaceutical sector, particularly in treating metabolic diseases such as diabetes and obesity. Leading companies in the sector, such as Novo Nordisk and AstraZeneca, are concentrating their efforts on developing glucagon-based drugs. These drugs aim to regulate patients’ metabolism, offering new hopes for weight control and diabetes management».
Accelerating research in nanomaterials
Another example can be seen in the work underway at the laboratories of the University of Dresden, focusing on materials for micro and nanoelectronics.
These developments are influenced by both current and future investments in Germany for microprocessor production and cover a wide range of aspects, from silicon and sand to the development of increasingly thin wafers.
This research culminated three years ago with the introduction of nanoelectronic materials for e-noses (electronic noses) and advancements in graphene-based sensors.
«Thanks to the use of artificial intelligence, research has evolved and accelerated», explains Cuniberti, who describes the emergence of a startup like SmartNanotubes. This company has developed the first multi-channel electronic nose gas detection chip for the mass market. The chip is sensitive, energy-efficient, and compact, using sensory elements that incorporate finely tuned nanomaterials. Notably, due to pattern recognition technology, the electronic nose can detect various gases, odours, and volatile organic compounds (VOCs).
«Today – Cuniberti concludes – we face many challenges. Challenges that require innovation. Innovation has always been driven by materials research. But today, unlike in the past, with a ‘minimal approach’, we can achieve these materials differently, thanks to the support of generative artificial intelligence. We can even think about building materials atom by atom».
Glimpses of Futures
The application of AI and generative AI in new material research paves the way for a new era of innovation, which could have disruptive effects not only scientifically but also economically and socially.
Aiming to anticipate possible future scenarios, we use the STEPS matrix to provide an overview of the impacts this methodology could have from social, technological, economic, political, and sustainability perspectives.
S – SOCIAL: the general public has an indirect interest in materials science. This field of research accelerates innovations that can enhance quality of life through new products and technologies. Health, energy, green mobility, and industrial production are all areas where materials science can instigate transformative changes and innovations. These advancements address significant societal issues related to demographic shifts, urbanisation, the climate crisis, and digital transformation.
T – TECHNOLOGICAL: data-driven research is considered a new paradigm in materials science. In this field, data is a valuable resource, and knowledge is derived from material datasets that are too large or complex for traditional human reasoning. To discover new materials or enhance the properties of existing ones, the concept of “open science” and advancements in information technology come into play. Advanced analytics, machine learning, AI, and high-performance computing are now integral to materials research methodologies. However, several challenges hinder progress in data-driven materials science: data veracity, integration of experimental and computational data, data longevity, standardisation, and the gap between industrial interests and academic efforts.
E – ECONOMIC: the various stakeholders involved in research activities, from academia to industry, and from the government sector to the public sector in general, attribute different meanings and expectations to a data-driven (and AI-driven) approach to materials science. Actual research is conducted within universities and the research and development departments of industries. It is in these “places” that data on materials are generated, and it is also here that the data are used and processed to be integrated into new value chains. Policy makers and governmental or private funding agencies, on the other hand, may be interested in promoting open science data, supporting scientific development through their policies and funding decisions. There is also a public interest, which we have already mentioned. Together, these actors form a mutually beneficial ecosystem, the vitality of which is crucial for the success and longevity of this new approach to materials science.
P – POLITICAL: on these issues, it is important to understand what Europe is doing, precisely because innovation enabled by new materials is crucial for the overall competitiveness of the EU. Chemical compounds and advanced materials, including nanomaterials, are essential and used in all sectors, such as health, electronics, energy, mobility, construction, and in industrial and consumer products ranging from batteries to packaging, from cleaning products to cosmetics, from pharmaceuticals to construction materials. Research must aim to develop materials that are superior in performance but also safe, sustainable, and circular. For this reason, in recent years, the European Union has issued several directives that must be considered. In the European Green Deal, for instance, there is mention of advanced materials and critical raw materials for a zero-emission industry and fair trade; the Zero Pollution Action Plan aims to reduce pollution of air, water, and soil; the Chemicals Strategy for Sustainability outlines safety and sustainability criteria for chemicals and a strategic plan for research and innovation; the Sustainable Products Initiative seeks to make all aspects of product design, production, use, and sale more ecological and circular, also addressing the presence of harmful chemicals; the Critical Raw Materials Act recognises the importance of advanced materials for efficiency and circularity of materials and announces a coordinated plan with EU countries; finally, the Net Zero Industry Act requires stronger production capacities for clean technologies within the EU, including innovative advanced materials. On 27 February this year, the Commission adopted a Communication on “Advanced Materials for Industrial Leadership” which envisages a series of actions from research to commercialisation to promote the design, development, and use of advanced materials in Europe. One of the planned actions is the proposal for a co-programmed partnership with industry: Innovative Materials 4 EU (IM4EU).
S – SUSTAINABILITY: there is an urgent need to place research and development of innovative materials at the forefront of current sustainability efforts. On one hand, new materials are required that utilise renewable energy and raw materials, advanced chemistry, physics, synthetic biology, and artificial intelligence to support initiatives aiming for net zero emissions and the promotion of a circular economy. On the other hand, it is important to acknowledge that materials science touches every aspect of human life, from renewable energy to energy efficiency, from nanotechnology to health, construction, transportation, manufacturing processes, recycling, and much more. This is where the contribution of this field of research is crucial both for environmental protection and for ensuring a better future for generations to come. Innovative and advanced materials and devices are needed to facilitate the transition towards greener technologies and contribute to a sustainable future, encompassing all aspects involved in their production and application: from design to fabrication, from characterisation to testing, from scalability to life cycle analysis.