Artificial intelligence is revolutionising speech therapy, offering innovative solutions that enhance the accessibility and effectiveness of treatments.

Speech disorders, such as dysarthria and aphasia, pose significant hurdles for many individuals, severely impairing their ability to communicate effectively and express thoughts, emotions, and needs. These conditions often stem from neurological damage caused by events like strokes, brain injuries, or degenerative diseases, which affect the brain areas responsible for speech production and comprehension.

The partial or total loss of the ability to speak not only impacts the functional aspects of communication but also deeply affects the psychological and social well-being of the person, who may suddenly feel isolated and frustrated by their inability to express themselves.

Traditionally, intensive speech therapy has aimed to reduce the impact of these disorders, but technological advances, particularly in artificial intelligence (AI) and machine learning, are transforming how these issues are managed and treated.

AI technologies can recognise and interpret unclear or confused speech, translating it into language that others can understand. Advanced systems, developed in specialised research labs, are facilitating communication, improving patients’ quality of life, and offering new solutions for those who struggle to engage in verbal interaction. This innovation not only helps to overcome linguistic barriers but also brings new hope to individuals who, for years, have struggled to express themselves due to speech disorders.


Artificial intelligence is transforming the treatment of speech disorders such as dysarthria and aphasia. Through advanced technologies, computers can now interpret unclear speech and convert it into comprehensible dialogue, significantly improving communication and the quality of life for those with verbal difficulties.
AI-supported Augmentative and Alternative Communication (AAC) technologies are providing new approaches to assist children with phonological disorders. Voice synthesis devices and visual tools enhance phonemic awareness, while AI personalises exercises, making the communication development process more effective and engaging.
Research led by Helen Meng at the Chinese University of Hong Kong is utilising AI to improve the recognition of dysarthric speech and monitor neurological conditions such as dementia. These technologies not only enhance speech comprehension but also serve as accessible biomarkers for diagnosing and tracking neurological diseases like Alzheimer’s.

What are speech disorders and their types

Speech disorders are conditions that impair a person’s ability to articulate words correctly, often due to damage to muscles, nerves, or the vocal apparatus itself. These are complex disorders that can manifest in various forms.

For example, dysarthria is a motor speech disorder caused by muscle problems affecting the ability to speak, while aphasia, usually resulting from brain injury, impairs the capacity to understand or produce speech, as well as to read or write.

Dysphonia is another type of disorder, which alters the vocal tone, affecting the quality, volume, and effort required to speak.

Parkinson’s disease can lead to a specific form of dysarthria known as hypokinetic dysarthria, characterised by a monotonous, low-pitched voice and difficulty articulating words.

Apraxia of speech, on the other hand, is a neurological disorder that prevents patients from performing the movements needed to produce sounds, even though they know what they want to say.

Another common speech disorder is stuttering, which involves issues with fluency, causing repetitions or prolongations of sounds. Finally, phonological disorders, such as “sigmatism,” make it difficult for individuals to pronounce certain sounds, even when there is no apparent physical cause.

These disorders can have a significant impact on the quality of life of those affected, making it essential to seek specialist therapies to improve communication abilities.

Speechandlanguagedisabilities provides an insightful taxonomy that helps clarify the types of disorders discussed in this article and how they manifest.

Schema che riporta una tassonomia per capire quali sono i disturbi del linguaggio e della parola e come si manifestano (fonte: “Speech and Language Disorders” - https://speechandlanguagedisabilities.weebly.com).
In literature, “speech and language disorders” fall under the umbrella of communication disorders along with hearing disorders and deafness, and physical disabilities affecting speech (source: “Speech and Language Disorders” – https://speechandlanguagedisabilities.weebly.com).

Approach to speech disorders

Traditionally, the ideal support for patients with speech disorders has been speech therapy, a specialised discipline aimed at helping individuals overcome language and communication difficulties. Human interaction has always played a crucial role in this field, not only to correct pronunciation but, more importantly, to empower people to express themselves, connect with others, and fully participate in society.

As technology rapidly advances, this approach is now being complemented by new tools that can enhance traditional speech therapy and address some of its limitations.

There is no doubt that speech therapy remains an effective solution, capable of bringing about significant improvements in people’s lives, whether it’s a child struggling with articulation, an adult recovering from a stroke, or someone living with a chronic speech disorder.

However, it remains a time-intensive therapeutic approach, often requiring multiple in-person sessions each week, which can be a barrier for those with busy schedules or who live far from therapy providers. It can also be costly, particularly when not covered by national health services.

From the patient’s perspective, it can be challenging to transfer the skills learned in therapy to everyday life, and traditional methods may sometimes lack engagement, leading to reduced active participation.

From traditional speech therapy to technology-enhanced treatment

As previously mentioned, speech therapy can be approached either through traditional methods, where therapists work directly with patients using personalised exercises, or through technology-enhanced therapy, which employs digital tools and interactive activities.

In essence, traditional speech therapy is based on direct interaction between patients and speech therapists, offering tailored attention by assessing individual needs and providing focused guidance. It involves the use of physical tools and tactile techniques to improve articulation, pronunciation, and fluency, as well as personalised vocal and language exercises to strengthen the patient’s communication abilities. Progress is monitored in real time, allowing therapists to make immediate adjustments that enhance the effectiveness of the therapy.

Technology-enhanced speech therapy marks a radical shift, combining traditional methods with advanced technologies. This enhanced approach leverages a range of benefits offered by technology, starting with greater personalisation of treatment. Machine learning algorithms analyse data to identify specific speech issues and adjust exercises accordingly.

Other advantages include improved patient engagement, increased accessibility and flexibility, instant feedback, and meticulous progress tracking, which facilitates the evaluation and adjustment of therapy goals. Additionally, technology can help reduce the cost of treatment, making speech therapy more accessible to a wider audience.

Key technologies in speech therapy

Let’s explore the technologies that are already transforming speech therapy, enhancing patient outcomes and the overall quality of services, as well as the future development prospects.

Among the most established technologies, dating back to the 1950s and refined in the 1980s, are speech recognition software programs. These allow patients to practise language exercises on devices like smartphones or computers, receiving immediate feedback.

Recent advancements in telemedicine have also improved access to speech therapy, especially for those living in remote areas or with mobility issues. By introducing greater flexibility, telemedicine enables sessions to be tailored to individual patient needs, ensuring more continuity and consistency in care.

In terms of engagement, significant improvements have come from apps that make speech therapy enjoyable by turning it into a fun, game-like experience. These apps address specific needs such as improving pronunciation, language development, and supporting Augmentative and Alternative Communication (AAC) for non-verbal individuals.

In recent years, virtual reality (VR) has provided further valuable support, offering immersive experiences that allow patients to practise in simulated, controlled environments with real-time feedback.

However, the most significant advancements are expected to come from artificial intelligence and data analysis. These technologies hold the promise of delivering highly personalised treatment plans and therapies, along with tangible benefits for patients, including those in early childhood.

The role of artificial intelligence and machine learning in treating speech disorders in children

Artificial intelligence (AI) offers new possibilities for addressing complex disorders, such as Speech Sound Disorder (SSD) in children. Technologies like Augmentative and Alternative Communication (AAC) have already shown promising results in supporting individuals with SSD, but AI-driven methods have the potential to further enhance both diagnosis and therapy. These innovations enable healthcare professionals to improve communication outcomes for children with speech disorders, promoting early diagnosis and personalised treatment.

The synergy between AI, machine learning (ML), and AAC not only opens up new pathways for speech therapy but also encourages greater parental involvement in the therapeutic process. Speech therapy typically begins between the ages of 2 and 4, capitalising on a child’s ability to learn language through listening and interaction.

In this process, parents play a crucial role as the child’s first educators. However, when parental support alone is not enough, conventional speech therapy is employed, offering personalised exercises designed by therapists.

While effective, this approach demands intensive one-on-one interaction, making it difficult to scale. This is where AI can make a significant impact by delivering customised interventions to a larger audience without sacrificing the personalised care each child requires.

Augmentative and Alternative Communication and AI

As mentioned throughout this article, Augmentative and Alternative Communication (AAC) refers to all non-verbal communication methods that assist individuals facing speech or language difficulties. “Augmentative” tools support verbal communication, while “alternative” methods refer to non-verbal communication options. AAC includes a wide range of systems designed to facilitate the transmission and understanding of messages.

AAC encompasses tools such as communication boards with images, speech-generating devices, tangible objects, facial expressions, and finger spelling. These tools allow individuals to express their thoughts, needs, emotions, and opinions independently.

For children with Speech Sound Disorder, advancements in AAC are expanding therapeutic possibilities. Speech-generating devices provide an alternative means of communication for those who struggle to produce clear speech, while visual aids like communication boards with images enhance phonemic awareness, making sound production easier. AI-powered technologies go a step further by transforming indistinct sounds into intelligible speech and customising exercises to improve communication skills.

AI-integrated devices use natural language processing algorithms to understand speech and offer real-time feedback, fostering expressive dialogue. Additionally, AI-based interactive games create immersive virtual environments that stimulate social interaction and language acquisition, adapting to each user’s therapeutic preferences. AI-powered screening tools enable rapid and accurate assessments of language skills, allowing for early diagnosis and timely interventions. These technologies represent a revolutionary advancement in the treatment of children with speech disorders, enhancing language development, social skills, and overall well-being.

A new frontier in dysarthria

AI is proving to be a powerful tool for patients with speech disorders, including those affected by dysarthria, a condition where neural or muscular damage impairs the ability to speak clearly, making speech slow, distorted, and difficult to understand.

Recent research highlights AI’s effectiveness in assisting dysarthric patients. «With the advanced tools developed in our research lab, computers can recognise the unclear speech of patients with dysarthria and convert it into intelligible language, facilitating and enhancing communication», explains Professor Helen Meng, an engineering systems expert at the Chinese University of Hong Kong.

At this university, Meng founded the Human-Computer Communications Laboratory, which focuses on speech and language processing and has evolved into the Centre for Perceptual and Interactive Intelligence (InnoHK Centre), where she now serves as director.

One of the lab’s main projects uses machine learning techniques to decode and reconstruct spoken language, with a special focus on non-English languages like Chinese, a tonal language with various dialects. According to Meng, developing robust algorithms that can adapt to multiple languages is essential to ensure technological advancements are accessible to all.

Meng’s team has developed an advanced Dysarthric Speech Reconstruction (DSR) technology for both English and Cantonese speakers. In this system, dysarthric speech is processed through an AI-driven “encoder,” which extracts features from the raw speech and translates them into latent data. This data is then analysed by additional algorithms to generate clearer speech via a synthesiser.

Between NED and GAN

As previously mentioned, dysarthric speech reconstruction (DSR) systems aim to automatically transform the speech of individuals with dysarthria into more normal-sounding, clearer, and more comprehensible speech. Several technological approaches exist for achieving this.

In the past, one of the most commonly used methods was based on GANs (Generative Adversarial Networks). However, more recent systems, built on NEDs (Neural Encoder-Decoder), have been shown to significantly improve the clarity of the reconstructed speech compared to GAN-based systems.

Yet, even NED systems have their limitations: the training process is inefficient and complex, as NED systems involve a complicated pipeline and include additional tasks, which can compromise the final quality of the reconstructed speech.

To address these issues, a new system called “Unit-DSR” has been proposed. This system draws on an advanced method known as “self-supervised speech representation learning” and uses “discrete linguistic units” (i.e., simplified and distinct representations of speech) to make the restoration of dysarthric speech content more straightforward and accurate.

The Unit-DSR system is far simpler than NED systems. It consists of only two main components: a linguistic unit normaliser (which adjusts how words are pronounced) and a vocoder (a tool that converts these representations into sounds) called “Unit HiFi-GAN“.

Tests conducted using a dysarthric speech database (the UASpeech corpus) demonstrated that this new system is more effective at improving speech quality, reducing word errors by 28.2% compared to the original dysarthric speech. Furthermore, the system has proven to be robust, meaning it performs well even when the speech rate varies or when there is background noise.

The results achieved and the opportunities beyond dysarthria

One of the major challenges the team faced was the lack of dysarthric speech data in languages other than English. To address this, Meng is building the Chinese University Dysarthria Corpus (CUDYS), a collection of vocal data from Cantonese speakers with dysarthria. This corpus is crucial for training AI models to reconstruct Cantonese speech.

So far, the system has achieved a significant reduction in error rates for dysarthric speech recognition compared to the original, significantly improving both the intelligibility and naturalness of reconstructed speech. To further enhance the system, the team is employing self-supervised learning techniques and discrete language units, which are helping to reduce error rates even further.

Crucially, the team has also focused on identifying distinctive vocal features, such as reduced muscular control, that can be analysed to improve speech recognition systems and develop more effective therapies.

Meng envisions additional applications for this vocal analysis technology, particularly in monitoring neurological conditions like dementia. «The voice is an inexpensive and easily accessible biomarker that can be used to detect and monitor neurological diseases» explains Meng.

Her team is working on an AI-driven platform that extracts indicators of neurological conditions, such as Alzheimer’s, by analysing speech characteristics like lack of fluency, hesitations, and filler pauses.

Glimpses of Futures

Now, let’s turn our attention to the future prospects of artificial intelligence for speech disorders, using the STEPS matrix to explore possible future scenarios for these technologies.

S – SOCIAL: human beings are inherently social creatures, with a natural inclination towards engagement and interaction. Verbal language, in particular, is a uniquely human trait, playing a crucial role in our ability to express thoughts, concerns, and perspectives to others. However, individuals with speech disorders face significant challenges—academically, psychologically, and socially—when attempting to interact with their community. The number of people living with disabilities is steadily rising: according to the World Health Organization, approximately 1.3 billion people globally require assistive technologies, a figure that could rise to 2 billion by 2030. The United Nations Convention on the Rights of Persons with Disabilities (UNCRPD) has reaffirmed access to assistive technologies as a fundamental human right.

T – TECHNOLOGICAL: the speech therapy software market has seen significant growth in recent years, with promising prospects for the future. Educational institutions are among the sectors contributing to this expansion, reflecting the growing recognition and adoption of speech therapy software in these settings. With ongoing technological advancements, these solutions are poised to play an increasingly central role in supporting individuals with speech difficulties. Looking ahead, the future of the speech therapy software market appears bright, with the potential to further transform the landscape of communication support, therapy, and interventions. As AI, machine learning, and other innovative technologies continue to evolve, the capabilities of speech therapy software will become more sophisticated, personalised, and effective, offering the promise of improving the quality of life for those with speech disorders and revolutionising therapeutic methods and support systems.

E – ECONOMIC: the use of artificial intelligence (AI) and augmentative technologies in the treatment of speech disorders provides significant economic, as well as social, value. Speech-generating devices and educational apps for language learning, such as augmentative and alternative communication (AAC) tools, enable individuals with communication difficulties to engage more fully in society, reducing their isolation and enhancing their productivity. The integration of these technologies into educational systems and the workforce not only improves language skill acquisition but also promotes greater workforce diversity, thereby making businesses more competitive.

P – POLITICAL: the political and regulatory implications surrounding the use of artificial intelligence (AI) in human communication require careful consideration of various factors. Firstly, regulations must be established to ensure data privacy and security, as language processing often involves the collection of personal information, such as voice recordings. It is essential that this data is safeguarded to prevent misuse or breaches. Additionally, ethical standards must be set to counter the improper use of AI, for example, in the creation of false or manipulative content. Furthermore, AI’s ability to grasp linguistic subtleties must be addressed to avoid misinterpretations that could negatively impact communication in critical sectors like healthcare.

S – SUSTAINABILITY: the application of technology to speech disorders offers substantial benefits from a sustainability perspective. The use of artificial intelligence (AI) and Augmentative and Alternative Communication (AAC) technologies primarily reduces the need for travel to access therapies, thanks to digital platforms and telemedicine tools. This makes treatments more accessible, especially for those living in remote areas or with mobility challenges. Moreover, these solutions promote inclusion, enabling individuals with speech disorders to participate more actively in social and professional life. Finally, the use of advanced technologies reduces the costs of traditional therapies, providing continuous and personalised treatments, which has a positive impact on both patients and healthcare systems.

Written by:

Maria Teresa Della Mura

Giornalista Read articles Look at the Linkedin profile