Thanks to a computational tool, a recent Australian study has identified mutations in traditionally overlooked regions of the human genome. This could potentially lead to a new approach in treating certain oncological diseases.

Within the human genome, there is a region that does not provide instructions for protein production, known as “non-coding DNA.” Recalling how this region was previously termed “junk DNA” by geneticists, Shurjo K. Sen, a researcher and Program Director at the Office of Genomic Data Science within the National Human Genome Research Institute (Maryland, USA), notes that «the human genome is a vast expanse of nucleotides, nearly 3.3 billion. In truth, only a very small fraction of this expanse, about 2%, encodes what we know to be proteins. In light of this, the question is not about the size but rather the activity of the remaining 98%. Is it simply doing nothing, or does it have its own function?».

Over the past decade, the scientist continues, «we have started to realise that what we consider ‘non-coding’ might actually have a more subtle way of conveying information. It may not encode in the classical sense, but there is still a wealth of crucial data contained within this part of the genome» [source: “Non-coding DNA” – National Human Genome Research Institute, 13 July 2024].

This was evidenced by a 2021 study published in Nature (“Non-coding deletions identify Maenli lncRNA as a limb-specific En1 regulator“), which first established a specific variation of a chromosome in non-coding DNA linked to severe congenital limb anomalies. That was just the beginning.


Researchers from the Australian Garvan Institute of Medical Research have studied genetic mutations affecting the binding sites of the CTCF protein, which plays a key role in genome organisation and gene expression. These sites are located within non-coding DNA, historically referred to as “junk DNA.”
The initial hypothesis focused on the possibility that genetic mutations might cause defects in the “anchoring” of CTCF binding sites. This could disrupt the 3D organisation of non-coding DNA, potentially creating an environment conducive to the formation of cancerous cells.
The discovery of persistent genetic mutations in CTCF binding sites across various cancer types suggests the future potential for developing effective methods and techniques to combat multiple cancers, including prostate, breast, and colorectal cancers. This approach would marry genetic-based oncology therapies with more “universal” treatment criteria.

CTCF protein binding sites in non-coding DNA

Two years later, a study by researchers from AstraZeneca’s Discovery Sciences, published in Nature (“An expanded genomic database for identifying disease-related variants”, December 2023), focused on the role of non-coding DNA in analysing the correlations between diseases and genetic mutations.

The authors explain that understanding this role is challenging due to the unclear distinction between potentially harmful and “neutral” variations within non-coding DNA [for more details, refer to our article “Genetic Mutations: A New Approach Identifies Potentially Dangerous Ones Even in Non-coding DNA”].

Building on this topic, a research team from the Epigenetics Laboratory at the Garvan Institute of Medical Research in Sydney published an article (“Machine learning enables pan-cancer identification of mutational hotspots at persistent CTCF binding sites” – Nucleic Acids Research, 2 July 2024) focusing on genetic mutations specifically affecting CTCF binding sites within non-coding DNA.

What is the CTCF protein? The acronym stands for “CCCTC-binding factor,” a protein that mediates genome organisation and gene expression. Mapping of CTCF binding sites across different species over recent years has revealed that these sites are prevalent throughout the genome [source: “CTCF as a multifunctional protein in genome regulation and gene expression” – Experimental & Molecular Medicine, 2015].

Non-coding DNA and “anchors” within the genome

It is not the first time that the Australian team has tackled the subject of CTCF protein binding sites in non-coding DNA. In a study from 2020 (“Constitutively bound CTCF sites maintain 3D chromatin architecture and long-range epigenetically regulated domains” – Nature Communications), they described how these sites can bring distant parts of non-coding DNA together, «forming 3D structures that control which genes are turned on or off». The researchers explained:

«In particular, we identified a subset of CTCF binding sites that we termed ‘persistent,’ as they act like ‘anchors’ within the genome»

The hypothesis that inspired the most recent work (July 2024) by the Garvan Institute of Medical Research team on genetic mutations affecting CTCF binding sites is the possibility of a defect in anchoring, resulting in a disruption of the 3D organisation process of non-coding DNA. This leads to the loss of the CTCF protein’s control over gene expression, causing significant impacts on the formation of cancer cells (the specific impacts will be discussed later).

To investigate this hypothesis, the authors utilised a machine learning system they named CTCF-INSITE (IN-Silico Investigation of persisTEnt binding), which predicts which CTCF binding sites in non-coding DNA might act as “anchors” (and thus prove to be persistent) across twelve different types of cancers, contributing to their formation and progression.

Two ML models to predict anchoring functions across twelve cancer types

Specifically, the computational tool employs two machine learning models trained on a set of genetic and epigenetic characteristics of simulated persistent CTCF binding sites (P-CTCF).

The fifteen selected genetic and epigenetic characteristics were chosen to determine the persistence of protein binding and the likelihood of the sites acting as “anchors” in non-coding DNA.

After training, the two AI models were tested through the analysis of over 3,000 tumour cell samples from patients diagnosed with twelve types of cancer (including prostate, breast, and colorectal cancers) available in the International Cancer Genome Consortium (ICGC) database.

The analysis of this vast dataset led to some significant findings and considerations. Firstly, persistent CTCF binding sites were identified in non-coding DNA across all twelve cancer types. Moreover, each tumour cell sample exhibited at least one mutation in a binding site, indicating that all anchors showed mutations. The research team noted: «This research confirms that in tumours, persistent CTCF binding sites act as ‘mutational hotspots’».

The second finding, verified through laboratory tests, supported the earlier hypothesis that a defect in anchoring – specifically due to cancer mutations – indeed reduces the binding properties of persistent CTCF sites.

What does this mean in practical terms? It means that these mutations in non-coding DNA are linked to the disruption of the binding factor, and this disruption can play a role in destabilising the 3D structures of DNA segments, potentially promoting the formation and progression of cancer cells. The researchers emphasised:

«Disrupted CTCF binding is associated with the loss of anchoring. The fact that cancer mutations are ‘enriched’ by this loss suggests that the breakage of the anchor provides them with a selective advantage. Overall, this opens up future hypotheses that mutations in persistent CTCF binding sites may promote programs based on the alteration of anchoring in cancer-related genes»

Glimpses of Futures

Although new studies are needed to validate the results obtained from the described research, they point towards a completely new approach to treating various types of cancer.

Now, by attempting to anticipate possible future scenarios, let us analyse – through the STEPS matrix – the potential impacts that the evolution of the study course illustrated, on the functions of non-coding DNA, might have on multiple fronts.

S – SOCIAL: the new generation of cancer therapies is moving towards personalisation, which includes targeted treatments tailored to the individual patient, aligning with their cells’ repair response to DNA damage and defined based on specific genetic mutations. However, this involves long waiting times for the patient. The discovery by the Garvan Institute of Medical Research of persistent CTCF binding site mutations common to different types of cancer suggests the potential, in the future, to develop effective methodologies and techniques against multiple cancers, embracing genetic-based but “universal” oncological therapy criteria, leading to quicker interventions. Moreover, given that the anchors present in the CTCF binding sites within non-coding DNA are essential for maintaining the genome’s architectural homeostasis, it is clear that mutations within them can disrupt the balance in cancer cells, envisioning future scenarios where research evolution will lead to the identification of key genes that could serve as markers for early cancer diagnosis or targets for new treatments.

T – TECHNOLOGICAL: in the future, delving into non-coding DNA mutations associated with the disruption of binding factors, and thus the breakdown of the 3D genome structures’ equilibrium, could benefit from the CRISPR Cas9 genome editing system. This system could study how such mutations, breaking the anchorage, contribute to tumour cell formation. CRISPR Cas9 is a tool capable of identifying and cutting target DNA sequences within the genome of plant, animal, and human cells, removing and replacing them with others. In this specific case, its use would support experiments aimed at fully understanding the carcinogenesis mechanisms linked to the interruption of the 3D organisation process of non-coding DNA.

E – ECONOMIC: faced with current targeted cancer therapies that, to be as closely aligned as possible with the genetic profile of the individual patient, require preliminary genetic tests and molecular diagnostics, which are quite costly, the future, potential universal therapies developed to address persistent CTCF binding site mutations – common to multiple oncological diseases such as prostate cancer, breast cancer, and colorectal cancer – that the illustrated research hopes for, would help reduce the oncological expenditure borne by the national health system. In this regard, it is estimated that the overall economic impact of oncological diseases in the EU exceeds 100 billion euros per year.

P – POLITICAL: an approach to cancer treatment like that envisioned by Australian scientists, focused not on the unique genomic characteristics of the individual tumour but on genetic mutations common to different types of cancer, will still require precise and attentive policies to ensure support through concrete programmes in the future. The European Plan to Combat Cancer, presented by the European Commission in 2021, moves in this direction, promoting, in addition to sustainable prevention and more effective early diagnosis, equal access to cutting-edge diagnostics and genetic-based treatments.

S – SUSTAINABILITY: as with personalised cancer therapies, future genetic-based but universal therapies, if validated and approved in the coming years, will also face the challenge of social sustainability, stemming from equitable access to their use and the reduction of health inequalities. In the specific case of our country, these issues arise with the approval of the draft law on the differentiated autonomy of the Regions on 23 January 2024, published in the Official Gazette on 26 June 2024 (Law 26 June 2024, no. 86), which establishes “health regionalism,” risking non-uniform guarantees of diagnostic tests and treatments for cancer patients across all regions.

Written by: