A more exhaustive iteration of the global catalogue of human genetic mutations, in tandem with a tool adept at its thorough analysis, has enabled the identification of specific genome regions whose alterations may precipitate severe diseases.

Diseases of genetic origin are numerous and frequently rare, rendering an accurate estimation of their total number challenging. As of 2016, for instance, the number of diseases arising from single-gene alterations exceeded ten thousand worldwide [source: “Genetic Disorders” – Genetic Alliance UK]. For several decades, the international scientific community has dedicated its focus to these diseases, particularly examining their association with genetic mutations.

Even a minor deviation in the DNA sequence from its normal structure can precipitate the onset of a genetic disease. The change (mutation) can affect a single gene or multiple genes, leading to damage in chromosomes or the structures that carry the genes themselves. Among the more well-known genetic diseases are Down syndrome, cystic fibrosis, Duchenne muscular dystrophy, and sickle cell anemia. Yet, many remain unknown and undiagnosed. The National Human Genome Research Institute has highlighted,

“… as we study and unveil the secrets of the human genome, we learn that nearly all diseases have a genetic component. Some are caused by inherited mutations from parents and present at birth, while others are acquired throughout life. These latter are not inherited but occur randomly or due to environmental exposure, such as to tobacco smoke or toxic substances, leading to various types of cancers, as well as certain forms of neurofibromatosis.”

Acquired genetic mutations’ constitute a more prevalent matrix for cancer compared to ‘inherited mutations.’ Even if changes in some genes do not directly cause the disease, they might still contribute to increasing the likelihood of developing an oncological pathology. For example, certain genetic alterations are responsible for a reduced capacity of the body to break down toxins released by cigarette smoke, meaning that among smokers, individuals with this type of DNA variation are at a higher risk of developing lung cancers or other smoke-related disorders [source: “Gene changes and cancer” – American Cancer Society].


Recent studies over the past two years on genetic mutations have increasingly highlighted the significance of those seemingly functionless regions of the human genome in researching the causes of as-yet unnamed diseases.
Variations affecting the non-coding portions of the genome have recently been included in the largest public database of human genetic mutations, and within a new methodology that determines, by scoring, which genes are more likely to be related to a given disease.
Both the mutation catalogue and the metric capable of quantifying the level of intolerance to genetic variations focus on non-coding DNA, thus marking the terrain of future diagnostics, which we envisage to be prompt, aimed at slowing the course of diseases and, in an evolutionary scenario, curing and defeating them.

Diseases and Genetic Mutations: Focus on Non-Coding DNA

The work of two researchers from AstraZeneca’s Discovery Sciences – one based in the USA office in Massachusetts, the other in the English office in Cambridge – emphasizes the importance of a factor now increasingly considered central in studying the correlations between diseases and genetic mutations, although fully understanding and exploring its functions remains a long journey ahead.

This factor is non-coding DNA, which the two scientists discuss in the article “An expanded genomic database for identifying disease-related variants”, published in Nature on December 6, 2023. “Non-coding DNA” refers to the portion of the human genome that does not instruct the production of proteins and thus does not “encode” proteins. It represents a significant portion, as only about one or two percent of DNA consists of coding genes [source: “What is noncoding DNA?” – National Library of Medicine].

Until a few years ago, this was even termed “junk DNA” by geneticists, considered purposeless in the broader context of human genetic activity. It was only in 2021 that research began to establish a precise link between mutations of non-coding DNA and disease states. More specifically, the study “Non-coding deletions identify Maenli lncRNA as a limb-specific En1 regulator”, published on February 10, 2021, in Nature, demonstrated for the first time how a specific chromosomal variation within a non-coding region is linked to severe congenital limb anomalies.

This represented a significant advancement, although the research team acknowledged that “interpreting non-coding DNA is still a major challenge, precisely because we know so little about it.”

What are the Mutations that Determine Health Status?

In their discourse on diseases and genetic mutations, the two AstraZeneca Discovery Sciences scientists commenced from a specific consideration: the distinction between potentially harmful variations and those “neutral” to human health in non-coding DNA portions is currently unclear.

In essence, not all mutations of non-coding DNA are hazardous. The authors particularly refer to a recent study, “A genomic mutational constraint map using variation in 76,156 human genomes”, by the team at the Broad Institute – a conglomerate of researchers from diverse disciplines and global institutions, including the Massachusetts Institute of Technology, Harvard University, and affiliated hospitals – and the Genome Aggregation Database (gnomAD), a public global catalog of human genetic mutations, first released in 2020 by an international research consortium coordinated by the same Broad Institute.

Initially, the database contained data on the coding DNA sequence of 125,748 individuals (all anonymous, naturally) and data related to the whole genome of 15,708 subjects. Since then – the AstraZeneca researchers emphasize – this vast trove of genomic big data has expanded, such that today, thanks to the work of the Broad Institute, it includes whole-genome sequences of 76,156 individuals from diverse geographical origins and ethnicities, providing a much more comprehensive picture of human genetic mutations, including those in non-coding regions.

Most mutations are clinically insignificant,” they comment, “The expanded and more varied collection of whole genomes in this latest version of gnomAD empowers geneticists to swiftly pinpoint those DNA variations that are genuinely rare – and therefore more reasonably linked to disease – even in non-coding genome portions.

Diseases and Genetic Mutations: A New Metric “Measures” Non-Coding DNA Variations

Concerning diseases and genetic mutations, the authors observe that a notable attribute of the rich and diverse Genome Aggregation Database is its aptitude for developing what are termed ‘intolerance metrics.’

These metrics are designed to identify genes that exhibit intolerance to genetic variation. In recent years, research has increasingly incorporated non-coding regions of the human genome into the parameters for measuring genetic intolerance. In this context, the aforementioned Broad Institute team has recently developed a metric named “Gnocchi,” which also assigns intolerance scores to mutations affecting non-coding genome portions.

They explain,

“… while protein-coding genes have well-defined boundaries, non-coding regions are not divided into functional units. To circumvent this issue, we divided them into ‘windows,’ calculating for each the degree of intolerance to variations.”

This innovative tool, an advancement over its predecessors, calculates the ‘theoretical’ number of mutations for each non-coding window, employing a computation algorithm trained on various genomic features, notably ‘local DNA sequences.’ The predicted number is then compared with the actual number of variants present in each sequence within the gnomAD catalog. Preliminary laboratory tests indicated that non-coding sections exhibiting significantly fewer mutations than anticipated were assigned higher Gnocchi scores.

This means that such portions are more intolerant to mutations and, thus, are more likely to be relevant to the onset of genetic diseases.

Glimpses of futures

Research on both inherited and acquired genetic mutations is the most potent weapon science possesses to name the still unknown diseases arising from them and diagnose them more promptly to slow their progression. The ultimate objective, commencing with early diagnosis, is to enable effective treatment. This is the most ambitious evolutionary scenario we envision when discussing tools such as those described, namely a global catalog of all human genetic mutations (in both coding and non-coding regions) and a metric capable of quantifying the level of intolerance to variations, even for non-coding genome portions, identifying those to be monitored as “recognized” as potentially harmful to health.

Certainly, much work remains. The study of non-coding DNA, in many respects, remains an enigmatic field where the boundaries delineating beneficial (neutral) and detrimental (harmful) aspects are yet to be clearly defined. In the future, the gnomAD database will continue to grow. The two AstraZeneca Discovery Sciences researchers state that “the priority is to continuously expand the catalog, so it can be even more representative of the global population. Only in this way, can it provide scientists with more means to unveil the hidden secrets of our genome.”

And the “Gnocchi” metric will be continuously refined in the coming years, to actively support physicians in analyzing the DNA of patients suspected of having an unknown genetic disease, assisting them – in diagnostics – to sift through all variants, excluding common and insignificant ones to focus only on those more likely to cause the pathology.

With the intent of anticipating potential future scenarios, we shall employ the STEPS matrix to outline the impacts, positive or negative, that the described solutions might have on multiple fronts.

S – SOCIAL: Early diagnosis of unknown genetic diseases, through the use of the illustrated tools, even in the neonatal period, would translate, in the future, into their more timely care – and, therefore, more conscious and less traumatic – by the families of patients and caregivers.

T – TECHNOLOGY: A database destined to become ever more extensive and complex like that of human genetic mutations might – in the long run – benefit from artificial intelligence techniques (including machine learning), for quicker analysis and cross-referencing of the large quantities of data it contains.

E – ECONOMY: In a future scenario, the positive economic impact provided by the application of the gnomAD database and the Gnocchi metric in support of early diagnosis of genetic diseases would lead to an overall reduction in costs due to diagnostic delays, borne by families, the health and welfare systems.

P – POLITICAL: Ethical concerns arise from the fact that, to date, the global genome database includes only almost fifty percent of information from people of non-European origin. To confer, in the future, the right character of equity to a medical tool, the relevant authorities will have to monitor the balanced heterogeneity of the data in terms of ethnicity and geographical belonging.

S – SUSTAINABILITY: From a social sustainability perspective, the prompt diagnosis of genetic pathologies must be accessible to all communities, to all families, in respect of everyone’s right to health and necessary care. The greatest risks concern socio-economically vulnerable people, especially where – as in Italy, from January 1, 2024 [source: Rare Disease Observatory] – the national health system does not provide free genetics-related services throughout the territory.

Written by: