Great post from Benedict Evans on the state of voice computing in 2017. On wider answer domains and creating the uncanny valley:
This tends to point to the conclusion that for most companies, for voice to work really well you need a narrow and predictable domain. You need to know what the user might ask and the user needs to know what they can ask.
This has been the annoyance with voice UIs. For me Siri was really the first commonplace voice interface I ever tried for day to day things. The dissonance between “you can say a few things” and “ask me anything” has been the issue with Siri. Apple set false expectations of the technology that end up creating a let down. Evans makes a good point on the combination of selecting the right problem and narrowing the domain:
This was the structural problem with Siri - no matter how well the voice recognition part worked, there were still only 20 things that you could ask, yet Apple managed to give people the impression that you could ask anything, so you were bound so ask something that wasn’t on the list and get a computerized shrug. Conversely, Amazon’s Alexa seems to have done a much better job at communicating what you can and cannot ask. Other narrow domains (hotel rooms, music, maps) also seem to work well, again, because you know what you can ask. You have to pick a field where it doesn’t matter that you can’t scale.
With the expansion of this tech in Google Now, Alexa, Siri and others, the problem becomes “what can I ask?” rather than the technical conversion of speech to text and text to command. “Ask me anything” is a non-starter, because right now you know the failure rate on any given question will be high. This is what happened with Siri and many users; it only takes a few failures of what we perceive as simple answers to switch us off entirely. I gave up on Siri years ago, and I wonder how hard it’ll be for Apple to reframe the perception of the technology to restore that confidence.