Using Brain-Computer Interfaces to Restore Speech

Post by Kelly Kadlec

What is a speech brain-computer interface?

A brain-computer interface (BCI) translates signals recorded from the brain into the control of an external device that can either restore or enhance a natural function. In March, we explored how this technology is being used to help people with paralysis regain independence by connecting motor areas of the brain to computer and robotic outputs that can replace the function of an arm or legs. In certain cases of paralysis, motor impairment can also result in the loss of speech. In the most extreme cases, for example, those who experience a brainstem stroke or amyotrophic lateral sclerosis (ALS), this can lead to locked-in syndrome. In locked-in syndrome, an individual’s cognitive functions remain intact, but their restricted motor function requires reliance on adaptive technology like eye tracking or BCIs for communication.

Eye-tracking devices and the first iterations of BCIs for communication had users spell out a word they wanted to say letter-by-letter. Although this helps communication, it does not replace speech. In the last five years, an alternative BCI for communication has advanced to the forefront of this technology - one that directly decodes the movements involved in speech itself.  

How is speech decoded?

Speech BCIs follow the same general pipeline as other motor BCIs: a neural signal is acquired and preprocessed, the parts of the signal most informative to the decoder are selected with feature extraction, a decoder maps these features onto a set of possible outputs, and then an external device, in this case, a computer, produces that output. Most research on speech BCIs has focused on acquiring signals from the motor cortex corresponding to the motor commands sent to the mouth, vocal cords, and diaphragm to produce speech. These command signals can then be translated to phonemes, the building blocks for words. The most popular choice in algorithms to decode phonemes has been recurrent neural networks. Finally, decoded phonemes are either vocalized by a computer-generated voice or assembled into words and displayed as text on a screen. 

What’s the history?

The first attempts at speech BCIs were developed using electroencephalography (EEG). EEG measures the activity of large populations of neurons from electrodes on the scalp. While it has certain advantages as a choice for adaptive technology including being non-invasive, the signal-to-noise ratio proved too poor to decode the phonemes of speech as described in the section above.

In 2009, a team of researchers from several institutions collaborated to implant a single electrode in the cortex of a patient living with locked-in syndrome. The electrode was placed in a speech area of the motor cortex, identified by having the subject attempt to say the names of pictures during functional magnetic resonance imaging (fMRI). They demonstrated that these signals were viable for decoding phonemes, but using only a single channel proved a constraint on how much could be decoded at a time. 

Where are speech BCIs today?

The promising results from initial attempts helped to motivate the implanting of high-density electrode arrays for speech BCIs, both with electrocorticography (ECoG) and microelectrode arrays. ECoG is an intracranial form of EEG that records local field potentials from electrodes resting on the surface of the cortex. Microelectrode arrays penetrate the cortex a few millimeters and record action potentials from individual neurons, sometimes up to several hundred single neurons at a time.

Over the past several years, research insights have shed light on the power of BCIs in restoring speech. In 2018, researchers at UCSF implanted a high-density ECoG array in a patient with severe paralysis who was unable to speak and relied on head-tracking adaptive technology for communication. The researchers were able to decode attempted articulatory movements from neural signals to “synthesize” speech with an anatomically accurate computerized mouth that “spoke” the words the participant had said

Around the same time, a group at Stanford implanted intracortical microelectrode arrays in the cortex of a participant who was unable to speak due to ALS and were able to develop a “speech-to-text” interface, which works sort of like voice-to-text without the need for a vocalization. This study decoded the largest set of words to date (125,000) with an error rate as low as 25% — the same rate previously achieved in the above ECoG study, which used a word bank of only 1,024 words. 

A recent collaboration between the University of California Davis and the Stanford team that developed the speech-to-text interface achieved 95% word accuracy from a 125,000-word bank decoding attempted speech from an ALS patient implanted with microelectrode arrays. In this study, researchers took advantage of the recent advances in machine learning of large language models to improve decoding speed and accuracy. Much like how computer vision leverages the natural statistics of an image, the highly structured nature of our language can be used by these models for higher accuracy predictions. The researchers also used machine learning trained with previous recordings of the participant speaking to create a computer-generated voice that sounded like them, allowing them to hear decoded speech in their actual voice. 

All of the work discussed so far has involved decoding speech from a person actively speaking, attempting to speak, or miming speech. An alternative that has only recently been considered is that of internal speech, in other words, our unspoken internal dialogue. Internal speech is saying a word in your mind without any attempt to vocalize it. Researchers at Caltech decoded internal speech from microelectrode arrays in the parietal cortex of a tetraplegic participant. The ability to decode different modes of speech from varying brain areas will be critical to developing personalized speech BCIs for the needs of an individual. If the recent years of progress have demonstrated anything, the possibilities for speech BCIs have only begun to be explored. 

What is the future of speech BCIs?

Speech BCIs have the potential to be one of the first implementations of motor BCIs to be widely available as a therapeutic device. Unlike the advanced robotics needed to fully replace more complex motor functions, speech output for speech only requires a computer. Results from speech BCIs also show advantages over other currently available adaptive technologies. Still, for speech BCIs to recreate natural conversation, improvements are needed. Perhaps most important is increasing the speed of speech decoded. Natural human speech is at a rate of about 150 words a minute for native English speakers, and current speech BCIs are under 80 words a minute. 

Another important consideration is the parts of speech that are important for communication outside of the words themselves. For example, tone of voice is lost in text and the currently available computer-generated voices are monotone. Body language and gestures are also personal components of communication that can be lost in translation. This highlights the need to eventually combine BCIs for speech with BCIs for other motor functions to restore full communication for locked-in individuals. 

The takeaway

The loss of speech can have a devastating impact on quality of life. Speech BCIs may be able to restore communication for individuals who are unable to speak by translating neural activity into computer-generated vocalized speech. These recent advancements in BCI technology along with its profound potential for positive impact may make speech BCIs among the first therapeutic applications to be approved and widely available. 

References +

Anumanchipalli, G. K., Chartier, J. & Chang, E. F. Speech synthesis from neural decoding of spoken sentences. Nature 568, 493–498 (2019).

Card, N. S. et al. An accurate and rapidly calibrating speech neuroprosthesis. 2023.12.26.23300110 Preprint at https://doi.org/10.1101/2023.12.26.23300110 (2024).

Guenther, F. H. et al. A Wireless Brain-Machine Interface for Real-Time Speech Synthesis. PLOS ONE 4, e8218 (2009).

Lopez-Bernal, D., Balderas, D., Ponce, P. & Molina, A. A State-of-the-Art Review of EEG-Based Imagined Speech Decoding. Front. Hum. Neurosci. 16, (2022).

Metzger, S. L. et al. A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620, 1037–1046 (2023).

Wandelt, S. K. et al. Decoding grasp and speech signals from the cortical grasp circuit in a tetraplegic human. Neuron 110, 1777-1787.e3 (2022).

Wandelt, S. K. et al. Representation of internal speech by single neurons in human supramarginal gyrus. Nat. Hum. Behav. 1–14 (2024) doi:10.1038/s41562-024-01867-y.

Willett, F. R. et al. A high-performance speech neuroprosthesis. Nature 620, 1031–1036 (2023).