Artificial Intelligence is transforming cameras to multipurpose devices

AI is everywhere. And it is going to be even more everywhere and help us out in various scenarios. I have written about AI and it’s uses in multiple posts – and more will be coming. This time, because on rumors regarding Apple’s MR/VR headset, it inspired to write my some of my thoughts on how is AI changing what we think about cameras / sensors and not just how it is changing our work text generation. AI is everywhere and it is going to be multimodal and multipurpose tool. GPT-4 is adding even more capabilities, of that I have a chapter in the end of this post..

The strong part of rumors regarding Apple’s headset was is that includes plenty of cameras. And those cameras are used to read facial expression (just like Meta Quest Pro does), to hand tracking but also to read your body movement. Instead of thinking using separate sensors you attach to your legs using cameras is going to make it easier. Apple’s headset, assuming rumors are true, can then see your leg positions and movements and thus replicate them to your digital twin – to your avatar – properly. This opens interesting possibilities to bring well working legs and body movement to your avatar in the Metaverse meeting. While the Apple’s headset is far from cheap (rumor is $3k) the idea of using inexpensive cameras and AI to work as sensors is excellent. Just like we have been waiting (and still wait) for Microsoft Teams Avatars: that they could “read” your expressions and arm/hand movements using the webcam and then relay them automatically to the avatar. And this would happen without a need to buy extra hardware, what can be potentially expensive.

This idea could be expanded to your webcam easily. Microsoft/AI could also give avatars legs and use the webcam for reading those actions if the person is standing. Even if you would be using a VR headset perhaps you could use your PC’s webcam as a secondary camera to allow reading of arm, leg and body movement if the headset doesn’t support it. This way even lower end headsets could be used for Metaverse meetings with better experience.

Or perhaps you just could use an another webcam or two to add to the accuracy. Or place your phone to a spot where it can view you fully. When analyzing the image just think about those tools (like Teams meetings) where you can extract off the background around the person. It is not perfect, but for the AI it needs to identify arms and legs and their movement. Getting it close enough might be the accuracy needed for avatars.

Microsoft Dynamics 365 Connected Spaces is using cameras to identify traffic inside a space: where are people, how many, where they move, are they waiting in a line and so forth. Azure AI and Edge technologies can be used for image and video recognition to identify tools, equipment and even people if necessary. Dynamics 365 Guides can recognize equipment and use it as a spatial anchor for the guide. AI and cameras are used to catch speeding car drivers by reading registration plates, robotic vision can be used for defect detection and so on and so on. This tech is already existing and is being used. The question is what will be the next steps where we start using ordinary cameras (and especially web and mobile phone cameras) to be used as sensors that can read even more details from various targets. This is opening new world where video image recognition is being used instead of potentially expensive specialized sensors.

Of course you can’t use just any camera in hostile, hazardous or sterile environments (think industrial and healthcare use) but those locations have also potential use cases where you could replace sensors with cameras. AI could determine movement and speed from the video image, if you don’t need to go to very fine accuracy. Perhaps a physiotherapy after the accident could benefit when reading leg positions and movement using the camera – AI could analyze the movement and provide data for the therapist and doctors. This way setting up a remote therapy site could be easier and done without having to use specific sensors attached to legs. I am not a healthcare professional, so this is just assumption and example that this would be a viable scenario. But I do think that professionals on various industries should start thinking about how to develop their area with the use of AI and video analyzing.

Using cameras for various needs is not something new. This has been done ages – I recall in early 90s when “machine seeing” was used in a paper process industry to detect and automatically sort out faulted items. What is changing is that we are getting that kind of capabilities into lots of cameras – and extend the basic purpose of webcam (our video feed) to other needs. AI is making that possible and transforms cameras to have more purposes than one. And GPT-4 is going to take a leap forward when it comes to understanding the picture.

GPT-4 opens new doors

GPT-4 has been a good jump forward from GPT-3 (or GPT-3.5) in many ways. For example the amount of parameters in GPT-4 is vastly larger compared to GPT-3 had. But what this has to do with cameras? GPT-4 is going to be multimodal model, which means that it can understand just not text, but also images. This is beyond image recognition – it is going to understand what the picture is about – not just what’s in it. The same contextual understanding we see in GPT-4 with text is happening with images. This can take AI capabilities in cameras way forward when we think how cameras would be used with AI.

Think various scenarios where we are using AI enabled cameras today. GPT-4 and what’s coming after that are going to open new doors and paths where we can utilize cameras and the information. In my earlier blog post How AI is Enabling the Metaverse I wrote how AI can speed up materialization of the Metaverse with generative content. Understanding what’s in the physical world and what is relevant for the context could be then generated, by the AI, to virtual worlds. Thus creating and maintaining digital twins – replicating reality to Metaverse. How accurate GPT-4 model really is, and can it do that – it remains to be seen. But there will be GPT-5 – and it is rumored to be here at the end of this year.