Computer Vision Explained
Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information. If AI enables computers to think, computer vision enables them to see, observe and understand.
Computer vision works much the same as human vision, except humans have a head start. Human sight has the advantage of lifetimes of context to train how to tell objects apart, how far away they are, whether they are moving and whether there is something wrong in an image.
Computer vision trains machines to perform these functions, but it has to do it in much less time with cameras, data and algorithms rather than retinas, optic nerves and a visual cortex. Because a system trained to inspect products or watch a production asset can analyze thousands of products or processes a minute, noticing imperceptible defects or issues, it can quickly surpass human capabilities.
Computer vision is used in industries ranging from energy and utilities to manufacturing and automotive – and the market is continuing to grow. It is expected to reach USD 48.6 billion by 2022.1
How Does Computer Vision Work?
Where is Computer Vision used
Computer vision programs use a combination of techniques to process raw images and turn them into usable data and insights.
The basis for much computer vision work is 2D images, as shown below. While images may seem like a complex input, we can decompose them into raw numbers. Images are really just a combination of individual pixels and each pixel can be represented by a number (grayscale) or combination of numbers such as (255, 0, 0—RGB).
images on computers are often stored as big grids of pixels. Each pixel is defined as color, stored as a combination of 3 additive primary colors: RGB(Red Green Blue). These are combined in varying intensities to represent different colors. Colors are stored inside pixels.
Let’s consider a simple algorithm to track a bright orange football on a football field. For that, we’ll take the RGB value of the centermost pixel. With that value saved, we can give a computer program an image, and ask it to find the pixel with the closest color match. The algorithm will check each pixel at a time, calculating the difference from the target color. Having looked at every pixel, the best match is likely a pixel from the orange ball. We can run this algorithm for every frame of the video and track the ball over time. But, if one of the teams is wearing an orange jersey, the algorithm might get confused. So, this approach doesn’t work for features larger than a single pixel, such as edges of objects, which are made up of many pixels.
To identify these features in images, computer vision algorithms have to consider small regions of pixels, called patches. For example, an algorithm that finds vertical edges in a scene, to help a drone navigate safely through a field of obstacles. For this operation, a mathematical notation is used, which is called a kernel or filter. It contains the values for a pixel-wise multiplication, the sum of which is saved into the center pixel.
Earlier, an algorithm called Viola-Jones Face Detection was used, which combined multiple kernels to detect features of the faces. Today, the newest and trending algorithms on the block are Convolutional Neural Networks (CNN).
Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs1. It is an interdisciplinary field that deals with how computers can automate tasks that the human visual system can do, such as recognizing faces, objects, scenes, etc2. Do you have a specific application of computer vision in mind?
Without having machines able to see, it will be difficult to teach machines to think. That is how Fei-Fei Li, from Stanford Vision Lab, describes the role of computer vision technology.
The difficulty is that computers see only digital image representations. Humans can understand the semantic meaning of an image, but machines rarely do. They detect pixels.
Semantic gap is the main challenge in computer vision technology. The human brain – or natural neural networks – distinguishes between components on images and analyzes these components in a certain sequence. Each neuron is responsible for a particular element.
That is why building an artificial solution as superb as the human brain took decades of research and prototyping. And artificial neural networks became the greatest breakthrough in machine learning.
A fundamental task in computer vision has always been image classification. Thanks to the use of deep learning in image recognition and classification, computers can automatically generate and learn features – distinctive characteristics and properties. And based on several features, machines predict what is on the image and show the level of probability.
How Does Computer Vision Work?
Computer vision is heavily used in healthcare. Medical diagnostics relies heavily on the study of images, scans, and photographs. The analysis of ultrasound images, MRI, and CT scans are part of the standard repertoire of modern medicine, and computer vision technologies promise not only to simplify this process but also to prevent false diagnoses and reduce treatment costs. Computer vision isn’t intended to replace medical professionals but to facilitate their work and support them in making decisions. Image segmentation helps in diagnostics by identifying relevant areas on 2D or 3D scans and colorizing them to facilitate the study of black and white images.
2. Automotive Industry
Self-driving cars belong to the use cases in artificial intelligence and have received the most media attention in recent years. This can probably be explained more by the idea of autonomous driving being more futuristic than by the actual consequences of the technology. Several machine learning problems are packed into it, but computer vision is an important core element in their solution. For example, the algorithm (the so-called “agent”) by which the car is controlled must be aware of the car’s environment at all times.
The agent needs to know how the road goes, where other vehicles are in the vicinity, the distance to potential obstacles and objects, and how fast these objects are moving on the road to adapt to the changing environment continually. For this purpose, autonomous vehicles are equipped with extensive cameras that film their surroundings over a wide area. The resulting footage is then monitored in real-time by an image recognition algorithm, which requires that the algorithm can search for and classify relevant objects not only in static images but in a constant flow of images.
Ready to get started? It’s fast, free and very easy!
Curabitur ullamcorper ultricies nisi. Nam eget dui. Etiam rhoncus. Maecenas tempus, tellus eget condimentum rhoncus, sem quam
Meet Our Publishing Authors
The Corpus Language Sciences, also known as linguistics, is the scientific study of natural language, its structure, use and evolution as…