Microsoft has been focused on voice for years, but this deal should clearly put it into position as one of the true technical leaders in the space. There is quite simply no other asset like Nuance in the space — they’ve been working at voice for three decades, in fact the tech behind Siri was born at Nuance. Dragon (or Dragon Dictation) has been Nuance’s mainstay voice product for years, with particular dominance in the medical space.
Why were medicine and especially doctors early adopters of voice? Precisely because of its primary advantage — we can speak about three times as we can type on a keyboard and six times as fast as on a mobile device. Doctors’ time is incredibly valuable, so they began using “analog voice tech” 40 years ago. I remember my mother as a pediatrician in the 1980s calling into a number to record her dictations about patients, which were then transcribed by human being into chart notes. Over 20 years ago, Nuance began replacing these human transcribers with voice recognition technology, and over the last decade has made huge strides in adapting this tech to other industries by using AI.
Nuance was unique in the industry in that it’s technology was so good it could charge a premium for its voice dictation product. This allowed it a business model afforded to no one else in the space, and re invest huge amounts into R&D. The results have been incredible — for example an error rate of one word out of every 150 in this 2018 test. Competitors like Microsoft’s Windows Speech Recognition and Google’s Docs were two to three times that, and Windows Dictate product was seven times higher(!) It’s easy to see why Microsoft felt compelled to act.
At WillowTree, we often compare the state of voice today to the internet of the late 90s, when it was called the “world wide wait”. Tantalizing for sure, but frustrating. We could all see where it was going to go “if it just worked.” We think Nuance will help Microsoft just make voice work by rolling Nuance’s incredible voice tech advantage into all MSFT’s products, and combine voice-based data entry and commands with screen based multi-modal results (no one wants to listen to Alexa read you back the description of the pizzas you just ordered — you just want to see them on a screen in an app or text after you ordered the pizzas via voice). If Microsoft can pull all this together, voice will transform the world the same way Mobile did 15 years ago, and the Internet did in the 90s.