The Metaverse is disrupting world in various ways already today. In my article No one can be told what the Metaverse is I touched the subject of content creation by the Artificial Intelligence. After a while I thought this is a good topic to keep on writing about.
- Text to Image
- Text to videos and speech
- There are even more text to something tools
- Text and Speech to Anything
Text to Image
As a recap, we are seeing already a lot of text to image content creation. I have been using this recently to create visuals to my presentations and articles. This means that I get more of less unique pieces of art and graphics. Earlier I was using Unsplash and other services for visuals, but I have to admit it is more fun to create your own with the help of AI. While creating of images with AI takes some learning to get good looking visuals it also teaches you that there are limits to these photos. You can’t use them everywhere, but as these become available to more and more people (Microsoft Designer and Edge will have built-in support for Dall-E 2) and AI keeps on learning – the more we will see these images be used. I also like that I can generate graphics to the setting I need – I am not limited to that what someone else has created. My imagination limits what kind of variants I can create with these tools.
Of course this means that it will disrupt people working with arts and photos. But until the Artificial Intelligence learns more about creativity there are still a lot of content creation where human mind will be the superior one. AI can only learn from something that has been already created – by people or AI. When human being creates a piece of art or sets up and takes great looking stock photos it takes lots of time and effort to get it right. AI can generate 10 different variations in a matter of minutes to hours. Not all variants will be usable, but sometimes there are several images from the same batch that can be used. For example this one I created with Stable Diffusion with the prompt Matrix Style Realistic Image of spaceship and green fire in city

And another one from the same batch

Those pictures are very different in the look and style. To me both look really nice and futuristic. I can also mix various styles and see what the AI comes up with.
There are of course also limits. People and especially faces are very difficult for the AI. When humans look stern the result is often better, but smiling and laughing can create very disrupted results.


There are a lot of text to image services you can use to get started. There are a lot more of these out there, but I just thought I will list a few for now. Out of these I have been using mostly Stable Diffusion, since you can set it up running on your own computer – and thus you don’t need to pay for generation.
And it is not just text to image, but also image to image generation, when a certain image can be used as a base and images forward are generated based on that image and the prompt.
I have already started to use Text to Image a lot when doing my articles, presentations and sessions. This doesn’t mean I would not use stock photos or Unsplash now and then, but it feels great to be able to make quite unique pictures to power up my decks.
Text to videos and speech
This was an another topic I wrote about in my previous post. Using a service such as D-ID you can generate videos from the text. And these also come up with API you could use to automate video creation. I can see text to video technology disrupting the need for recording videos especially on training and learning materials, onboarding and welcome use. Information in the Metaverse and web can be kept up to date easier and faster. The quality quality and the result is also consistent and not depending on having the same setup every time. This of course requires that these services are used responsibly because some services can provide deep-fake results very easily. Perhaps for some of us it is good that the video result has a bit of imperfections, to remind us that it was indeed created by the AI.
For example here is the face of me speaking the text in both English and Spanish. Since I don’t know Spanish I had to use machine translation for the text and then just set up video generation that it is in fact Spanish. I can also use the same tool to generate a lot of other language videos – including Finnish.
There are also lot of other services out there. I have wrote about Synthesia you can use to create presentations for example.
With Microsoft Azure you can use various Artificial Services it provides. Text to Speech can be used with pre-made models of people speaking to create custom audio. Or you can teach your own voice to the service and use that for generating realistic text to speech with your own voice. This way you could generate the content automatically, to include a voiceover for the presentation for example. Without having to set up a recording studio.
There are even more text to something tools
For presentation creation you should be looking forward for upcoming Microsoft Designer and Create. A lot of people I know use Designer in PowerPoint to refresh visuals on their deck. These new tools will be combining tech together with designer, image creation and other features. Microsoft Create is mostly just a hub for content creation and use of various templates, but it will integrate with Microsoft Designer.


Microsoft Azure AI can summarize a conversation or document with it’s Cognitive Services. It is in preview today. Summarizing text content is important as numbers are growing all the time. You can already save your Word document as a PowerPoint presentation, but I would not be surprised to see this one advancing with the help of AI.
How about creation of 3D models by describing them? Text to 3D! There are available tools like DreamFusion Text-To-3D and upcoming Magic3D. We will be seeing next tools that can create 3D spaces or scenes from the text. How about animating your models?
It is easy to envision this path going forward in a fast pace, as Artificial Intelligences are trained to understand more and more different kinds of content.
Text and Speech to Anything
We will be seeing a lot of growth in this area. New tools will pop out, although some of them use the same base technology, all the time. New text to anything tools will be invented. In the future we want to automate various processes and content creation is one of those that will be seeing the use of AI grow heavily.
Text to Anything is one example how the Metaverse technology (or just technology evolution) is disrupting current ways of creating content. While this won’t be replacing artists and human creativity it certain changes how they are doing business because a lot of content will be created by the help of AI. It will speed up producing materials and also reduce seeing those stock photos everywhere.
This won’t be making creative minds obsolete – instead we will value them even more, as innovative people can come up with new ways to create new and fresh content. They will use the AI to help their creative process as well. In the end – we can see our imagination materialize into images, 3D models and scenes. To virtual spaces and worlds, described by the text and generated by the AI – or actually it will be Text and Speech to Worlds where we can tell the AI what we want and it will be generate us the great space for that meeting where we need to be innovative.

2 thoughts on “Text to Anything is changing content creation”