May 6, 2019. – What I just read in The Verge has left me frankly surprised. There are numerous companies that by analyzing your voice can know who you are and what you can do in the future. All this in the context of our current conversations with virtual assistants on mobile phones and smart speakers. I recommend reading the following summary.
Voice is not only ubiquitous; it’s highly personal, hard to fake — think about the incredulity surrounding the falsely deep voice of former Theranos CEO Elizabeth Holmes — and present in some of our most intimate environments. People speak to Alexa (which has erroneously recorded conversations) in their homes, and digital voice assistants are increasingly used in hospitals. Voice journal apps like Maslo rely on the user speaking frankly about their issues. By now, many people know that tweets and Instagram posts are going to be monitored, but fewer think about our voices as yet another form of data that can tell us about ourselves and also give us away to others. All of this has led to exciting research about how this information can enrich our lives, as well as privacy concerns about how accurate such insights are and how they will be used.
The key to voice analysis research is not what someone says, but how they say it: the tones, the speed, the emphases, the pauses. The trick is machine learning. Take labeled samples from two groups — say, people with anxiety versus people without — and feed that data to an algorithm. The algorithm then learns to pick up the subtle speaking signs that might indicate whether someone is part of Group A or Group B, and it can do the same on new samples in the future.
The results can sometimes be counterintuitive, says Louis-Philippe Morency, a computer scientist at Carnegie Mellon University who built a project called SimSensei that can help detect depression using voice. In some early research that tried to match vocal features with the likelihood of attempting suicide again, Morency’s team found that people with a soft, breathy voice, not those with tense or angry voices, were more likely to reattempt, he says. That research is preliminary, though, and the links are usually not so simple. Typically, the giveaway is a complex set of features and speaking patterns that only algorithms can pick up on.
Still, researchers have already built algorithms that use the voice to help identify everything from Parkinson’s disease to post-traumatic stress disorder. For many, the greatest promise of this technology sits at the intersection of voice analysis and mental health and the hope of creating an easy way to monitor and help those at risk of relapse.These are all plausible applications, says Ghosh, the MIT scientist. Nothing jumps out as a red flag for him. But as with any predictive technology, it’s easy to overgeneralize if the analysis is not done well. “In general, until I see proof that something was validated on X number of people and this diversity of population, I would be very hard-pressed to take somebody’s claim for granted,” he says. “Voice characteristics can vary quite a bit unless you’ve sampled enough, which is why we stay away from making very strong claims.”
For his part, Degani says that the Voicesense speech-processing algorithm measures over 200 parameters every second and can be accurate on many different languages, including tonal languages like Mandarin. The program is still in the pilot stage, but the company is in touch with large banks, he says, and other investors. “Everybody is fascinated by the potential of such technology.”
Customer service is one thing, but Robert D’Ovidio, a criminology professor at Drexel University, is concerned that some of the applications that Voicesense envisions could be discriminatory. Imagine calling up a mortgage company, he says, and they use your voice to determine that you’re at higher risk for heart disease, and then you’re deemed a higher risk because you might not be around for a long time. “I really think we’re going to have consumer protection legislation created to protect against the collection of these,” D’Ovidio adds.
Some consumer protections like this exist, points out Ryan Calo, a professor at the University of Washington School of Law. Voice is considered a biometric measure, and a few states, like Illinois, already have laws that guarantee biometric security. Calo adds that the problem of biases that correlate to sensitive categories like race or gender is endemic to machine learning techniques, whether those techniques are used in voice analysis or looking at resumes. But people feel viscerally upset when those machine learning methods are used for facial or voice recognition, in part because those characteristics are so personal. And while anti-discrimination laws do exist, many of the issues surrounding voice analysis run into broader questions of when it’s okay to use information and what constitutes discriminations, which are concepts that we as a society have not adequately grappled with.