As seen on Psychology Today.

One of the most common diagnostic tools used by clinicians is pattern recognition. When a clinician repeatedly sees patients with identical clusters of signs and symptoms, and these patients are repeatedly being diagnosed with the same illness, it is reasonable to assume that a new patient exhibiting the same cluster of signs and symptoms has the same illness as previous patients. The clinician recognizes a pattern and draws a conclusion.

Though humans are extremely good at recognizing patterns, new technologies are oftentimes far more thorough. Precision instruments can be designed to identify anomalies that might not be perceptible to the human eye or ear but can still indicate that something is wrong with a patient. In some cases, these anomalies may even serve as something known as a biomarker. To use the definition created by a National Institute of Health group in 2001, a biomarker is, “A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes or pharmacologic responses to a therapeutic intervention.”

Biomarkers are already used in several areas of medicine (e.g., checking fasting plasma glucose to test for diabetes), but neurology and psychiatry still rely heavily on self-reports from patients. When a patient is feeling depressed and anxious or struggling with memory problems, they describe these symptoms. There hasn’t been a place where a clinician can look to confirm what the patient has conveyed to them.

But that might be changing.

Vocal Biomarkers and Psychiatry

New technologies that make use of increasingly granular voice analysis software have revealed that patients’ voices may contain multiple biomarkers that go well beyond just the content of a patient’s speech. This should not come as a surprise. When a loved one is not feeling well, you can usually tell, even if you’re speaking over the phone. In fact, even as far back as the 1920s, researchers recognized that patients with depression had a tendency to speak slower, more monotonously, and at a lower pitch than healthy controls. Meanwhile, patients who are more agitated or experiencing a manic or hypomanic episode tend to be more frenetic in speech—they speak breathlessly and often at high volumes.

Metrics like pitch, rate, and loudness can all be analyzed via a smartphone to assess the presence of depressive symptoms, as well as their severity. Meanwhile, voice breaks while talking, throat clearing, increases in hoarseness, and decrease in pitch have also been found to correlate with increases in stress hormones like cortisol, which can be indicative of increases in anxiety or symptoms associated with post-traumatic stress disorder.

What’s perhaps most exciting isn’t just that technology has advanced enough to detect changes in the voice and associate them with changes in one’s mental health. Rather, it’s the level of precision that these new tools are capable of achieving. In some cases, they can recognize changes in patients’ voices that are so subtle that no human could notice them. Moreover, these subtle changes tend to occur before symptoms become fully manifest or crises emerge, thereby allowing for early intervention.

For patients with severe and persistent mental illnesses like bipolar disorder or schizophrenia, this is a life-changing technology.

Vocal Biomarkers and Neurology

Similar technologies appear to be applicable in the field of neurology, as patients with conditions like Parkinson’s disease, Alzheimer’s disease, and even multiple sclerosis tend to have altered speech patterns. As a relatively obvious example, individuals who are developing mild neurocognitive impairment or entering the beginning stages of Alzheimer’s disease tend to exhibit word finding difficulty, but researchers have also found these individuals increase the number of filler words (e.g., um, uh, er) in their speech, simplify their grammar and word choices, and speak more slowly.

In the case of Parkinson’s disease, irregularities like pitch variations and imprecise articulations occur in an estimated 78% of patients in the early stages of disease progression, oftentimes before other symptoms become apparent. With MS, dysarthria (weakness in the muscles used for speech) has been found to be a common and early manifestation of the disease, suggesting that quantitative acoustic assessments could be used to monitor disease progression.

Clinical Applications and Limitations

As AI becomes more commonplace in the clinic, and as more patients rely on wellness apps, it seems likely that technology that monitors for vocal biomarkers will become more common, as well. Rather than asking new patients for written responses during therapy intake, they may instead reply verbally using an app that then analyzes their speech. This can be done in the waiting room or in the comfort of their home. The results of this analysis can then be passed to the clinician.

As another example, a therapist may have a patient with bipolar disorder who is not aware they are experiencing a manic prodrome—changes in behavior, affect, or cognition that occur before the full onset of a manic episode—even if they are engaging in increasingly risky behaviors. If their therapist could be alerted to the development of these symptoms through vocal analysis after a phone call, a session, or even a daily check-in with an app, the therapist could potentially intervene and stop the patient from doing something extremely reckless like

harming themselves or others, blowing through their bank account, or running afoul of the law. Moreover, the therapist could do so without having to closely monitor the patient’s behavior and potentially violate their privacy.

It’s important to remember that this kind of technology and these kinds of apps are only tools, and that tools are only effective when used properly. They can certainly help us as clinicians as we strive to make the quickest and most accurate diagnosis for every patient. However, they cannot, nor should they ever, replace the judgment and expertise of an experienced clinician.

At the present time these technologies are far from adoption in mainstream clinical settings and their integration will require mutual acceptance by both the patient and the clinician.