Computer Vision is the field of scientific discovery that looks to develop techniques to help computers ‘see’ and understand the content of digital images, photographs and videos. While the idea of Computer Vision may sound somewhat simple, in reality, it has been one of the most complex problems AI engineers have had to solve. It is due to the fact that we still have a relatively limited understanding of biological vision. We understand the functioning of the eye but it is the complexity of visual perception that makes seeing and understanding sometimes quite subjective.
As computing power increased significantly in the 1990s and the internet became mainstream, there were suddenly large sets of images available online for analysis. This was when Computer Vision and its most famous application, facial recognition programs really began to come into their own. As of today, Computer Vision as a product of AI is really undergoing somewhat of a renaissance and it’s down to the convergence of 4 specific factors:
- Built-in cameras in mobile technology have resulted in a digital world saturated with photographs and videos
- Computing power has become more accessible for substantially less money
- Hardware specifically designed for Computer Vision and analysis is more widely available
- Newer algorithms like convolutional neural networks are able to actually take advantage of the hardware and software capabilities we now have more than ever before
To date, it has become relatively easy to index and search for text, but in order to index and search images, algorithms actually need to understand what an image contains. For a long time now, the content of images and videos has been ‘unclear’ and best described using meta descriptions, which are typed in by the individual who uploaded them. To really get value out of image data you need computers to not only see an image but to understand the content.
Like human capabilities for sight and comprehension, we would like to have computers do the following:
- Describe the content of a photograph it has ‘seen’
- Provide a basic summary of a video it has ‘seen’
- Recognise and identify a face it has ‘seen’
This would then allow us to categorise all of these elements for search more accurately (among other things).
How does Computer Vision work?
Computer Vision is a multidisciplinary subfield of AI and ML, and more specifically Deep Learning. It works in a simple three-step process.
Firstly, image acquisition, which can happen in real-time through videos, photos and any other kind of 3D technology. Images are then categorised for analysis. The second step is processing the image. This generally involves automated Deep Learning models. More often than not these models have been trained by being fed thousands of labelled and pre-identified images. The third step is crucial and is what makes Computer Vision distinct from image processing. It consists of understanding the image, which means the computer interprets an image that is properly identified and classified.
Examples of Computer Vision in industry
Many businesses have begun to add Computer Vision to their shopping and brand experiences with Amazon effectively leading the charge. Let’s take a look at CV solutions in action.
Amazon has streamlined the shopping process down to an art through the applications of Computer Vision and Machine Learning. Through the collaboration of app and store (meaning you’ll need the app to get into the store), Amazon Go uses Computer Vision to keep track of stock, maintenance and every customer in store to ensure effectiveness and security. Cameras and sensors in a brick and mortar store connect each customer to their Amazon account while also keeping an accurate stock count of each item in the customer’s trolly. As soon as you’re done shopping the bill is automatically charged to your Amazon account without having to deal with a cashier to confirm your purchase.
Pinterest Lens takes connecting you with your interests to the next level. In order to use the feature, all you need is to snap a picture of an object and Pinterest Lens finds you similar items from their directory of images and articles. Pinterest is little more than an enormous catalogue of images and articles, and through a comprehensive Deep Learning CV backlog, the algorithms working in the background help feed the data you are shown. In fact, in June of this year they included a shop tab to Pinterest Lens that allows you to find shoppable products you like based on the picture you’ve uploaded. Pretty neat huh?
This particular product from Amazon is dedicated to fashion and it includes a combination of voice-activated camerawork, styling advice for the outfits you select and incredibly detailed cinematography for capturing those awesome snaps you take in your new ensemble. This application not only analyses your outfits to find a photogenic likeness, you can also use it to accessorise as it helps feed you possible options you can buy from Amazon to add to that particular look. With its one-day delivery program you can be all kitted out for a big night on the town faster than you can blink.
Even COVID-19 makes an interesting appearance in the field of Computer Vision with Uber. The ride-share app has created a feature that helps detect whether both the driver and the passengers are wearing masks or not. This kind of application has helped get Uber drivers back on the road safely. An absolute win for those who were losing income over business shutdowns during the pandemic.
CV can also help you with internal processes especially for production companies that carry large amounts of stock. It can help with everything from quality checks to warehouse supply tracking to counting deliveries in the shipping process. Which makes it particularly handy for all business types.
While not new, Computer Vision technology is advancing at meteoric rates which makes it one of the most popular new frontiers of Artificial Intelligence. Over time the expectation is that CV will make it much easier for consumers to engage with your business and products through better shopping experiences. The way we see it, Computer Vision is definitely a technology worth keeping an eye on!