We will be mostly speaking to each other in the Metaverse, instead of typing our messages, then why not to do the same when we create the content? Yes, we use Whiteboards and other manipulative objects with our digital twin hands (aka avatar’s hands) but we also need to create text-based content. Why wouldn’t we speak to both people and AIs in the Metaverse? In this sense, dictating a letter is speaking to an AI. We have seen this in various scifi movies and people are already using that in our world. Another piece of futuristic scifi that is already here – and part of the Metaverse.
Yes, we can already use voice as the user interface for our devices. You can use it especially with mobile phones (Google, Apple Siri), on some computers (Windows Cortana) and devices (Siri, Google, Amazon) with more or less functionalities. With Microsoft Teams Cortana can help you out. Sending a text /social message with a voice can end up with interesting results – and if you are up to it you can use voice to dictate content for Word document.
I would assume dictation have been used much more in English speaking countries and language support has been expanding every year – just look at Word’s dictation support and capabilities; even Finnish is supported in the preview already. It would indeed be fun to live that scifi-movie scene where you can dictate a message / chapter flawlessly using dictation.
For many, typing it is still much less frustrating and also much faster. Dictating a message or reply is being used on mobile devices – especially when driving – more, but at least for me there are still several blockers: one being that mobile should switch between languages seamlessly and also having it to understand my intention easier. So, I don’t dictate or type messages while driving. And if I am not driving I type it.
You could say I am quite not the target audience of voice UI. But I am writing this article, because I know it will change also for the in the future. With the advancement of AI and added understanding of languages, automatic translations and other smart features using voice UI will be easier and easier. Teams seems to get also better and better with embedded Cortana. And you can also drop in voice messages to chat if you like.
When you are in the immersive Metaverse using a headset you will have a very limited way to type. Using a virtual keyboard will not be fast – just try out to enter a secure wifi password or other credentials to your smart TV’s app you see how fun that is. It just hurts me to think I would use a controller to write a message that is longer than three short words. With HoloLens you can use a holographic keyboard and type the message – but it is not that fast either.
This is where the voice UI comes handy. It will be faster to use speech to text than enter content in any other way. You do that hands free and can see the result all the time – this means you can also do some editing easier than – for example – dictating a message in a car without looking at it.
Voice UI is also an accessibility feature. When we use a VR/XR headset we have a situational need for accessibility (unless you are a really good blind typing person).I can also see read aloud being an another feature we may enjoy to use in the Metaverse. Perhaps even for just relaxation or when we notice our eyes don’t like reading using a headset due to lower than ideal graphics ( we all can not have the state of the art headset).
In Microsoft Metaverse we will see various elements that let us work together with people inside and outside of the immersive space. I wouldn’t be surprised to see a Teams client in the Metaverse also – so we could see what’s happening in Teams (allowing us to multitask.. bad me! ) and act with messages. Perhaps we want to open our calendar, join a other metaverse space / meeting and so on. But mostly we would use voice UI because we want to co-create content with others. Using Whiteboard with inking is slow to write (and my handwriting is really poor) but dictating a text to text box.. or into a Loop component. Opening a Word document for editing and seeing changes others do and using dictation to add my texts. Working on a PowerPoint together. Or adding a new notes / comments to the Power BI report.
It just would not make sense to take off the headset to do some short typing and then put it back on to continue co-authoring. Especially if you are in a situation room with comments, remarks, adding tasks or in a town hall meeting and want to add some notes to yourself. That is when voice would be useful – and when you are in the dictation mode I hope others won’t hear you saying it if you are doing private notes (unless you allow it).
We won’t see all this on day 1, but eventually I see voice UI getting more usage via the Metaverse. And with that, we will start using it more with our mobile or computers as well – because it will feel more normal. And it will evolve to understand you much better than dictation often does today.
If you are getting worried about the complexity of using the headset to be able immersive Metaverse; you don’t have to. Mesh for Teams will bring in the possibility to join with 2D view using your Desktop Teams, web or mobile. I don’t think most of the people will have any VR/XR headsets and that is how Team’s 270 million monthly active users will be enjoying it – including everyone.