When you turn on the camera to take a selfie, it detects and focuses on your face. This process of detecting and tracking objects is called computer vision (CV) and is used for more than just working with faces.

This technology can be used to classify images, read meter readings, track vehicle movements, recognize text and determine the position of the ball on the football field. As a scientific discipline, CV refers to the theory and technology of creating artificial systems that derive information from images.

Over the past couple of years, computer vision has shown itself to be excellent. For example, fairly accurate systems for intruder detection, visitor counting, smoke and fire detection, customer identification, and many others began to be created.

In any case, CV solutions can significantly speed up and facilitate many different processes if they are provided with the right algorithms and equipped with the right cameras.

When it comes to machine learning algorithms for solving detection, tracking or classification problems, it is the completeness of knowledge, resources, or the choice of an executor that is at stake. As for the cameras, they can also be ordered along with the algorithm, but you can also buy them yourself. For example, in the case of creating your own solution to a problem or as a test of a hypothesis.

But what characteristics should a camera have to do a good job? Let’s try to figure it out.

1D, 2D and 3D cameras

1D cameras, also called line or line cameras, are a type of CV camera that differs from what we are used to in that the images in them are formed by scanning the subject. Such a camera has a special sensor, most often containing one line of pixels, and is capable of generating an infinitely long picture.

That is why line cameras are indispensable on conveyors, as they allow you to shoot fast-moving objects without blurring. And when using linear scanning, there is no need to provide overlapping of successive frames and their further programmed gluing, as in matrix cameras, since the image is continuously formed in the buffer of the built-in memory of the linear camera.

2D cameras are cameras we are used to that create images in two-dimensional space: in width and height. In fact, this is the most predominant type of camera in use.

3D cameras are used when it is necessary to analyze the volume of objects, their shape or position in three-dimensional space. Some 3D cameras use two or more machine vision lenses to record multiple points of view, while others only have one machine vision lens that simply shifts its position.

Analog and digital cameras

There are two main groups of cameras that differ in the way they process data and transmit video signals: analog and digital.

In analog cameras, the image comes from the matrix in an analog format, then it is digitized for processing and again converted into an analog signal for further transmission. Through a coaxial cable, the video signal goes to the monitor and to the DVR, which digitizes, encodes and compresses it for recording.

Analog cameras are suitable for creating video surveillance networks, as they are cheap, easy to install and use, tamper-resistant, and have no delays in data transmission. Also, if you use cameras of new standards with high image resolution, such as HDCVI, HD-TVI or AHD, you can get good image quality, including moving and distant objects.

This means that such cameras are perfect, for example, for tracking systems of athletes’ activity: determining posture, speed of movement, monitoring the implementation of the rules of the game and determining goals. They can also be used for security tasks, for example, to control the perimeter of an object or to detect things left unattended at train stations or airports.

In a digital camera, the signal is not converted back from digital to analog for transmission, but is sent to the recorder in digital form. At the same time, before transmission, it can be encoded and compressed – this is the case in IP cameras, or it can be transmitted uncompressed and unencrypted – as it is in HD-SDI cameras.

Digital camera systems are easily scalable, upgraded and optimized. They are more expensive, but they can be used for any computer vision task.

Machine vision cameras

There is one more type of cameras that should be mentioned – these are machine vision cameras.

These cameras capture high-definition images and send them uncompressed to your computer. Because of this, the resulting images do not look as beautiful as from conventional cameras, which compress and smooth them, but in good quality and with high detail.

Therefore, machine vision cameras are usually used in industrial automation or in the medical field, where you need to see everything.

Machine vision cameras can range from VGA up to 86 megapixels for horizontal or 4K for single-line scanning – and shoot up to 200 frames per second. However, due to such indicators, their price is quite high.

Matrix type

For video cameras, two types of matrices are used: CCD and CMOS. They differ in both the device and the principle of operation. To find the right one, you must first determine the required range of tasks. For example, a CCD will not allow infrared vision, which is necessary if you need to generate heat maps of objects.

CCDs used to be considered better quality, but also more expensive and more energy intensive. As for the principle of operation, this type of matrix collects all images in analog format, and then digitizes them.

The advantages of CCD over CMOS include high light sensitivity, better color rendering, low noise and high dynamic sensitivity. Among the disadvantages are the complex principle of signal reading, high power consumption and expensive production.

The CMOS matrix, a complementary metal-oxide-semiconductor, initially won with lower power consumption and cost, but was inferior in quality. Also CMOS, unlike CCD, digitizes every pixel at once.

Among the advantages of this matrix are high performance, low power consumption, as well as cheaper and easier production. The disadvantages are low light sensitivity, pixel fill factor and dynamic sensitivity, as well as high noise levels.

Matrix size

When choosing a camera for solving computer vision problems, you should pay attention to the size of the matrix and the size of its pixel: the larger they are, the better. This is because the dimensions of the photosensors responsible for converting the image projected onto the matrix into an electrical signal and scanning it are determined by the size of the individual pixels of the matrix, which have a value from 0.005 to 0.006 mm.

The larger the pixel, the larger its area and the amount of light it collects, which means the higher its light sensitivity and the better the signal-to-noise ratio.

It’s also important to look at the number of pixels. The more pixels, the higher the resolution and, accordingly, the resolution. As a result, a face can be identified at a greater distance and small details can be seen.

Focal length and angle of view

Focal length is the distance between the machine vision lens and the sensor. It determines how far and wide your camera can see.

The shorter the focal length, the wider the angle of view and less detail can be seen on distant objects. The larger it is, the narrower the viewing angle and the more detailed the object is viewed.

A commonly used type of machine vision lens has a focal length of 3.6 mm, which corresponds to the angle of view of the human eye. Such cameras are used for surveillance in small rooms: for example, for counting visitors to a store or cafe.

Machine vision lens type

There are many types of machine vision lenses: normal, wide-angle, telephoto, super long-focus and others. Their main difference from each other is the angle of view, which depends on the focal length. The main types are shown in the picture below.

We should also mention this type of cameras as fisheye – Fisheye. They are also called panoramic cameras because they have a 360-degree field of view.

There are also cameras lacking a standard lens. They are called pinhole cameras and are used for covert video surveillance.


The aperture, also known as the lens aperture, is a hole through which light passes and determines the camera’s ability to shoot a high-quality image in low light conditions. This element is part of the lens and is located directly in front of the lens.

In order for the camcorder to give a clear image even in low light conditions, the aperture must be as open as possible, which means that the aperture must be small.

Aperture adjustment

The diaphragm is an element of the structure of a video camera lens, which is responsible for the flow of light to the matrix.

Cameras are available with a fixed aperture and adjustable aperture. Video devices that have no diaphragm at all belong to the first group.

There are three ways to adjust the aperture on camcorders:

  • Manually.
  • Automatically by the device itself using constant current. This adjustment (ARA) works like the pupil of the human eye: the more light comes in, the narrower it becomes.
  • Automatically by a special module built into the lens and tracking the light beam passing through the relative aperture.

Electronic shutter

This parameter is responsible for how long the shutter remains open when shooting, that is, it indicates the time during which the light falls on the sensor. In short, the whole point can be described as follows: when the shutter is open, light falls on the matrix, the more light, the higher the exposure, and hence the brightness of the final image.

An electronic shutter adjusts the sensor’s light sensitivity to match the light level of the scene. Adjustment occurs not by adjusting the luminous flux entering the matrix, as with automatic diaphragm control, but by adjusting the duration of the accumulation of electric charge on the matrix.

One of the most useful features of the electronic shutter is manual shutter speed control. In low light conditions, the device automatically sets low values, which leads to blurred images of moving objects.

But it is worth noting that the shutter speed is lower than the automatic aperture control. In an open space, where the lighting changes from dusk to bright sunlight, it is better to use video cameras with an ARD, and in rooms where the lighting changes slightly, cameras with an electronic shutter.


The camera resolution is the size of the resulting image. The higher this characteristic, the more details can be seen in the video. It is measured in television lines (TVL) or pixels.

Resolution in TVL is a measure of the number of brightness transitions, in other words, vertical lines placed horizontally on the image. It gives an idea of ​​the size of the output image. Resolution in TVL is indicated in video recordings in numbers: 380, 420, 480.

The video resolution in pixels is the horizontal and vertical size of the picture, for example, 1600 by 1200, or the total number of pixels, for example, 8 megapixels. To find out the resolution of an image in megapixels, knowing its size, you need to multiply the horizontal value by the vertical value and divide by 1,000,000: 1600 × 1200 / 1,000,000 = 1.92 megapixels.

When camcorder manufacturers indicate the resolution of the final image in the documentation, this may mean not its size, but the number of pixels on the matrix. Therefore, it is also necessary to pay attention to the characteristic “effective number of pixels”.

Effective pixels are the pixels that actually form the final image. Often this indicator corresponds to the real resolution of the resulting image, but not always.

Infrared illumination

Infrared (IR) camcorders have special LEDs that emit infrared radiation that can be captured by the devices’ sensitive matrices. This allows shooting in complete darkness.

When a certain minimum of illumination is reached on the object of observation, the camcorder automatically switches to the shooting mode in the infrared range and turns on the infrared illumination. The resulting image will always be black and white, regardless of whether you have a color camera or monochrome.

Also in some cameras there is a function of “smart infrared light”, which is designed to adjust the power of infrared radiation depending on the distance to the object. This is done so that objects close to the camera are not overexposed in the video.

Infrared filter

When the camcorder is equipped with an infrared light function for shooting at night, it may malfunction in the form of a violation of the color spectrum of the final image when shooting during daylight hours.

This is due to the fact that cameras with IR illuminators have a more sensitive matrix and capture the infrared spectrum not only at night, but also during the day.

To avoid this, manufacturers integrate a so-called mechanical infrared filter (ICR) into the devices. It covers the matrix during the day and prevents the penetration of infrared radiation, and at night it shifts and lets IR rays through.

It should also be noted that the ICR filter can be installed in cameras without infrared illumination. This will avoid the infrared spectrum during the daytime, which will improve the color rendition of the video.


This parameter determines the minimum illumination at which the camcorder is able to produce a clear picture with a negligible amount of noise. It is measured in lux (lx) and the lower the value, the higher the sensitivity of the device.

If the minimum light sensitivity of the camera is 0.3 lux, then the device will not give you a clear image at night without additional IR illumination.

To avoid this, you need to know the shooting conditions in advance and select a camera for them. For example, illumination on a clear sunny day ranges from 32 thousand lux to 130 thousand lux, on a very cloudy day it is 100 lux, and at night – 1 lux.

Signal to Noise Ratio

Another parameter that determines the quality of the video is the ratio of the power of the useful signal to the power of the noise. It is called the S / N Ratio and is measured in decibels (dB).

The higher the number, the better. For example, to get a high-quality video shot with a modern camera, the S / N Ratio should be 40 dB or higher.

Noise reduction

Since there will always be noise when shooting video, manufacturers are actively developing noise-canceling technologies. At the moment, two are widespread: 2D DNR and 3D DNR.

The first filtering algorithm is outdated and has its drawbacks. Usually it removes only those noise that is in the foreground. Also, when the signal is processed, the details of the image become a little blurry.

The second algorithm is newer and performs better than the first. It is able to remove nearby noise, snow and grain in the background.

Frame frequency

The frame rate, or FPS, determines the smoothness of the resulting video. The higher this indicator, the better the result will be.

To get a smooth picture, you need to have an FPS of at least 16-17 frames per second. For professional cameras, this figure can be higher than 120 frames per second.

However, you need to take into account the fact that the higher the frame rate, the heavier the video will be and the more the transmission channel will be loaded.

Backlight compensation

To avoid blown-out areas in video recordings, manufacturers equip their cameras with technologies such as WDR, HLC, and BLC.

WDR stands for Wide Dynamic Range. In total, two varieties of this technology are used: Digital WDR and True WDR. The first uses software algorithms that artificially brighten dark areas of the frame. The second one takes pictures with different exposures, in order to combine them later to obtain a frame with the optimal brightness of all objects.

HLC stands for High Light Compensation. The approach is based on the fact that algorithms actually remove, based on the average brightness, blinding light sources and make the dark parts of the image distinguishable. The technology is useful in applications where the image can be spoiled by headlights or floodlights.

BLC stands for Back Light Compensation. The technology improves the exposure of the entire image using digital signal processors that divide the image into areas and adjust the lighting in each area. BLC simply brightens the frame, and this can cause overly lit areas of the image to turn into white spots.

IP and IK protection classes

You should pay attention to these characteristics when you buy a video camera for outdoor video surveillance or for shooting in rooms with high humidity and dust.

The IP class (Ingress Protection) indicates the degree of protection of the device against the ingress of solid objects and water. A camera’s IP rating is indicated by an international protection mark (IP) followed by two numbers. The first digit has a numerical range from 0 to 6 and indicates the protection against the ingress of solid objects. And the second – from 0 to 8 and defines the protection against the ingress of water.

The most common outdoor security cameras are IP66 and IP67.

The IP66 rating means that the camera is completely dustproof and also protected from sea waves and strong water jets. That is, water that gets inside the case will not interfere with the operation of the device.

The IP67 rating means that the device is completely dustproof and protected from partial or short-term immersion in water. That is, the penetration of water in dangerous quantities is excluded when the camera is immersed in water for a certain time at a certain pressure.

Class IK, or anti-vandal standard, indicates the degree of protection against mechanical influences. The levels of protection in this standard are in the numerical range of 00 to 10 and are dependent on the ability of the enclosure to withstand impact energy, measured in joules (J).

The most common protection class among outdoor CCTV cameras is IK10. It means that the device is capable of withstanding an impact energy of 20 J, that is, a drop of a load weighing up to 5 kg from a height of up to 40 cm.

Communication interfaces

After analyzing the characteristics of the cameras, it is also worth considering the communication interfaces offered by the device manufacturers. Frequently used ones can be distinguished:

  • USB 2.0 is a fairly inexpensive communication system. This communication interface requires a cable to be connected to the camera, which is typically up to 5m in length. In theory, USB 2.0 has a bandwidth of 480Mbps, which can result in data loss.
  • USB 3.0 is faster, has a bandwidth of 5000 Mbps and is therefore more reliable than USB 2.0. The maximum cable length is 8 m.
  • GigE, or Giga Ethernet, is a low-cost communication interface that accepts cables up to 100 meters. Its bandwidth is 1000 Mbps.
  • CameraLink has a bandwidth of 6800 Mbps. This interface is very expensive, but great for high resolutions. The maximum cable length is 10 m.
  • PoE, which stands for Power Over Ethernet, allows electricity to be transferred along with data over a twisted pair cable over an Ethernet network. This means that the use of PoE makes it possible to power the device during data transfer. As for the bandwidth of the communication interface, it is 1000 Mbit / s.


From our experience, we can say that one of the key factors for the success of a CV solution is a properly selected video camera. It should be remembered that the same device is not always suitable for different tasks, therefore, the choice of a camera should be approached responsibly, having studied the relevant characteristics for the intended use cases.

For example, if you want to recognize and track cars on the street in real time throughout the day, then we would recommend a two-dimensional IP camera with a CMOS sensor, 1/4 “size. The lens is moderately wide-angle, with a focal length of 14 mm. Device must have a DGS and an F / 2.8 aperture.

To see and distinguish cars, you need a resolution of 720p and FPS from 17. If the shooting area does not have additional illumination at night, then an IR illumination and an IR cut filter will come in handy.

Also, don’t forget about S / N Ratio from 40 dB and WDR support. Since the shooting will be done outdoors, it would be nice to have IP67 and IK10 protection classes. As for the communication interface, PoE will do.

In the case when you want to recognize the faces of visitors in the bank, for example, you can also use a two-dimensional IP camera with a CMOS matrix, the size of which is 1/4 “.

If the camera is located near the cash register, a normal lens with an electronic shutter and an aperture of F / 2.8 will suffice. To recognize clients, you need a resolution of 720p, more than 17 FPS and S / N Ratio from 40 dB. Protection classes – IP66 and IK10. The communication interface can be either CameraLink or PoE.

When it comes to the use of video cameras and computer vision in production, suppose, in order to detect the amount of drink in a bottle during the filling process, then you need a machine vision camera. They do an excellent job of this kind of task.

Also, if you need to record instrument readings using video cameras and algorithms, we also recommend a machine vision camera.

If you intend to fly the drone in FPV mode, for example, analog cameras can be used here. For those who like to fly low, fast and avoid obstacles, it is always necessary to have a spare analog system on board. The analog video stream is inherently “instant” and has no latency, which greatly reduces the likelihood of crashing into a tree when flying a drone. However, digital FPVs with low latency have already begun to emerge.

To detect anomalies, for example, forest fires, it is important that there is a camera with a resolution of 2.07 megapixels and with an FPS of 25. You also need an IR illumination and an IP67 protection class.

If you want to detect fast-moving objects, say balls or sports cars, then you need a frame rate of 30 and the same fast neural network.

For more information, check now on https://www.dzoptics.com/en/


Leave a Comment