The Road Ahead for Speech Recognition Technology

virtual AI robot pushing a virtual NLP button

Speech recognition technology has had its place in the enterprise tech stack for years, but the onset of COVID-19 has proven its worth even further. Our recent annual Trends and Predictions for Voice Technology in 2021 report found that 2020 saw a marked increase in voice technology adoption among enterprises, with 68% of respondents reporting their company has a voice technology strategy, an increase of 18% since last year.

This is for a number of reasons – it can increase efficiencies across organizations, give them better access to data from conversations, even abade our contact-free wishes during the pandemic. Given that the number of organizations adopting speech technology is set to increase as its capabilities grow, providers need to focus their attention on the barriers to adoption and ensure that user concerns are addressed. Only then will the technology’s true value be recognized. 

Speech Recognition Technology in the Now

Over the years, speech recognition technology usage has increased, as have the vast capabilities offered. Voice and speech are being used in multiple use cases, from ones everyone may interact with daily like smart speakers and virtual assistants, to some less common to the average consumer like machine interfaces in contact centers and regulatory compliance for businesses. As voice technology becomes a core component of many businesses’ tech stacks and strategy, we expect over the next 5 years that businesses and organizations will start to fully integrate voice technology into their workflows and tech stacks. According to the innovation adoption lifecycle, voice technology is currently being used by the early adopters. Those putting in the resources and utilizing these capabilities now will have an early advantage when this technology’s potential is realized by the larger majority. 

The Hurdles to Jump and Barriers to Adoption 

Despite the measurable impact of speech recognition technology, there are still hurdles to jump and barriers to adoption for many enterprises. According to our recent report, some of the biggest barriers to voice technology adoption include accuracy (73%) and accent or dialect-related issues (51%). Only 28% of respondents noted cost as a barrier to adoption. But to increase access and prove value to those organizations that have not yet adopted the tech, speech recognition providers must improve the accuracy of their speech-to-text capabilities. 

Accuracy of the word output, or the word error rate (WER), affects the accuracy of speech-to-text transcription. Yet, other factors must be accounted for like speaker changes, punctuation, context-specific words, homophones, and more that are often unique on a case-by-case basis. But one major area for improvement lies in accent or dialect-related issues. Two possible solutions exist to address this issue. 

The first option is a speech recognition engine that is designed to work best for accent-specific language models. For example, this means creating a language pack for Mexican Spanish, Spanish Spanish, Peruvian Spanish, and so on. Or, the second solution is to build an any-context speech recognition engine that understands all Spanish accents regardless of the region, accent, or dialect, like Speechmatics’s Global Spanish Pack. The results of the second solution speak for themselves with frictionless user and customer experiences, as no one needs to change their voice to suit the engine or worry that their accent won’t be recognized. 

The Adobe Voice Survey 2020 confirmed that better accuracy is the most desired improvement as 57% of users say improvements in accuracy would cause them to use voice technology more often or for more purposes. To provide the best value to users and offer unprecedented value to enterprise customers and consumers using voice technology, this is an area that we must prioritize and focus our energy to improve.

Speech Recognition Technology Moving Forward in the New Normal 

Considering COVID-19’s impact on the space, the Adobe Voice Survey 2020 revealed that 31% of users note sanitation and not touching high-traffic services as a reason to use voice technology. Moreover, 86% of respondents noted that voice technology could make public interactions, like visiting a business or attending an event, feel more sanitary and safer. Respondents also expressed interest in using voice tech to replace actions such as choosing a floor on the elevator (55%), opening a door (56%), or using a vending machine (49%). Actions that were once seen as everyday normals will be revolutionized as consumers adjust their expectations. As such, voice technology is a vital tool to make everyone feel safer as we approach a new normal.

For those who have not yet adopted or included voice technology in their strategies, 60% reported it’s something they’ll consider in the next 5 years. And as voice is the easiest form of communication, we expect there will continue to be a major shift in communication within organizations in 2021 and beyond, and speech technology will lead the way for these changes.


* indicates required