User experience is beyond visual and must also cover vocal

Note: This post is older than two years. It may still be totally valid, but things change and technology moves fast. Code based posts may be especially prone to changes...

Number of words: 1175.

(Loaded 727 times)

Vocal user interfaces come to my attention when playing around with my phones voice assistant. I treat screen-reader as a vocal user experience as well. They are not very related though and that came as a surprise for me. But voice assistants have giant impacts for everybody, not only from accessibility perspective but in general when thinking about humans interacting with computers.

Most of us think of user experience as a very visually oriented experience. That goes especially for majority of people in tech. That is connected with ourselves as users using graphical user interfaces.

But let’s think for a second about assistive technology. You may be using it when reading this blog post and maybe you are not even aware that you are using assistive technology at all. Because glasses for example are so common that they may not be treated as assistive technology. As we all age – and that is a good thing – our perceptions may change. It is very common that our vision gets worse with age. And then we soon need assistive technology to be able to read for example. We buy glasses or lenses and they become almost a part of us.

Sometimes and for some people glasses may not be enough. And maybe we need to use more advanced assistive technologies. Like for example screen-reader. Screen-readers are amazing pieces of software that can be installed on majority of operating systems and even majority of devices, even smart phones. With majority of world’s population on mobile phones it is amazing that with help of screen-readers even blind people can be included.

So when we think of user experience – we must reach beyond visual user experience and also cover vocal user experience. I think of vocal user experience from two perspectives – there are voice assistants that are being used more and more often and there are also screen-readers that allow all sorts of people to interact online. Not only blind people but also for example people that have difficulties with reading or maybe also people that want to hear the text instead of reading it. So vocal user interfaces are not only for people with disabilities and some will maybe even argue that vocal user interfaces are much more user friendly than visual user interfaces are.

Voice assistants are more and more popular – and a proof that UX needs to define vocal interactions as well

I’ve never been a big fan of voice assistants because I am not a native English speaker and because I have tried them too soon. Yes, trying them when they first become available on my not-so-smart phone (Ericsson T28 for example) was a really bad experience. But when we fast forward to 2021 and check the status on the market we can detect that their popularity and stability is quite amazing. Amazon, Apple, Google and others are doing pretty well with help of humongous voice samples and machine learning. And the voice assistants are really getting better and better. And the more people that use them the better they are.

So, getting back to my thoughts – user interface is by far not only visual. And user experience must take that into consideration. One of the most important aspects is how screen-reader users interact with webpages and mobile applications. There we have quite good starting point when we understand the Web Content Accessibility Guidelines and their impact on accessibility. But we must reach beyond them and also think about semantics that helps to interact via voice if we want to cover also voice assistants.

Not only UX also developers must think about vocal interfaces

I’ve never really worked with Speech Synthesis Markup Language (SSML) and I am just starting with it but it really has a lot of potential and when I think of it I can compare it with HTML for voice. It let’s developers define how the text will be converted to speech.
What an amazing possibility – defining speech with code. So here we can again come to some conclusions and quickly draw some mental parallels with HTML. In my opinion user experience and user interface designers must understand the possibilities of HTML before they can make really good user experiences and interface designs.

Knowing HTML basics when making websites and web applications is understanding the possibilities and limitations of the whole platform. Sure, it is possible to invent, but it is much more cost efficient and practical to use the existing patterns to make products. Innovation can introduce some huge risks – for example excluding some users because accessibility was not integrated correctly or maybe not integrated at all.

The same goes for SSML – developers and designers must first understand what it brings to the table.

SSML has not much to do with screen-readers but is used for voice assistants

I was surprised when I learned about SSML – it felt like the perfect tool to make better screen-reader experience. But let me be clear – SSML is not supported by screen-readers. SSML is more a voice assistant programming language that improves the vocal user experience. So knowing SSML is currently not related to accessibility in regards of screen-readers. But at the same time – voice assistants can also be used for better accessibility. And making vocal user experiences better is important for everybody. So SSML is in a way also important for accessibility when we think of users that use voice assistants.

There is always room for improvements

So when thinking about vocal user interfaces we can really think in at least two directions:

  1. screen-reader users experience – that is very much unidirectional (user using keyboard for input and voice from screen-reader as output),
  2. voice assistant user experience – that is bi-directional (user saying the commands to voice assistant and voice assistant saying the results back to the user).

When using HTML correctly, maybe sometimes adding ARIA to improve the missing parts of the experience we can make good screen-reader user experiences. Then we also have to consider voice assistants and improve the experiences for them. As designers and developers of the platform we must therefore know the components and possibilities of the platform and with wide adoption we should invest some time in learning more about voice assistants as well. They are thought to be the most human user interfaces that can be found, moving cognitive loads to computers instead of users so we can say that their reach spans also to assistive technology but goes much beyond that alone.

I will not predict the future, but after playing with voice assistant on my phone and then trying to make a simple voice assistant interaction with providers tools I felt that this has huge impacts for everybody, including people with disabilities.

Author: Bogdan Cerovac

I am IAAP certified Web Accessibility Specialist (from 2020) and was Google certified Mobile Web Specialist.

Work as digital agency co-owner web developer and accessibility lead.

Sole entrepreneur behind IDEA-lab Cerovac (Inclusion, Diversity, Equity and Accessibility lab) after work. Check out my Accessibility Services if you want me to help your with digital accessibility.

Also head of the expert council at Institute for Digital Accessibility A11Y.si (in Slovenian).

Living and working in Norway (🇳🇴), originally from Slovenia (🇸🇮), loves exploring the globe (🌐).

Nurturing the web from 1999, this blog from 2019.

More about me and how to contact me: