Image Fusion Overview

Many Images Are Better Than One

Mobile camera platforms are more and more present in various industrial applications but also in the consumer market. This trend is fostered by the availability of cheap, small and powerful cameras. However, the sensors in those digital cameras exhibit several types of limitations and discretization in terms of spatial resolution, dynamic range and temporal information. For instance the number of pixel elements of a sensor is fixed and defines the resolution of the captured images. On this site discuss algorithmic solutions using image fusion and image enhancement methods to overcome those hardware limitations. For static setups image fusion techniques are already well studied. However, for moving platforms the ego-motion of the camera needs to be compensated, requiring adequate image registration. We investigate how image fusion techniques can be extended and applied to produce useful results on images captured with moving camera platforms. We focus on three methods of image fusion: superresolution, high-dynamic-range (HDR) imaging and motion detection. These methods are of great interest to mobile cameras and in closely related surveillance tasks. The old saying “Two eyes see more than one” is very relevant for the focus of computer vision. In statistics the support of multiple measurements allows to make more precise observations than with one measurement alone. In the field of computer vision image fusion techniques draw from this phenomenon.

Superresolution

In certain applications such as forensic investigations the spatial resolution of a video is often not sufficient to recognize the content, for instance for identifying a suspect in a video by its face or a car by its license plate. However, the imaging process in cameras introduces several artifacts including blur, downscale and compression, which reduce the image content, i.e. the level of detail. This results from limited spatial discretization of the camera sensor, the digital processing inside the camera and blurring caused by lenses and atmosphere. To overcome those, image reconstruction methods such as superresolution [1] have been proposed. These combine multiple low-resolution input images to reconstruct a high-resolution image, which has finer details and a greater number of pixels. The following figure depicts typical improvements achievable with superresolution.

HDR Imaging

Camera sensors also have limitations in the range of light intensities, i.e. radiance values, that can be measured and distinguished. The range of light hitting the sensor is usually dramatically compressed into a much smaller range of intensity values, e.g. 256 for 8-bit images as used in the common JPEG format. This causes very bright/dark areas to appear over-/under-exposed in the captured image, especially if the 8-bit range is not efficiently used. However, the human eye can still perceive all the details in the real scene in those areas. In applications such as surveillance, where cameras have to cover differently illuminated areas within a single view, image fusion can be used to address this issue. Multiple input images with low-dynamic range are recorded with different exposure settings, e.g. varying the exposure time, and fused to recover an image with a much higher dynamic range [2]. In the following figure an example of such fusion is illustrated.

Motion Detection

In some surveillance applications from aerial camera platforms, detecting moving objects is an essential part of analyzing the scene and recognizing anomalies of ground-based objects. Comparing two or more images, which have been recorded at different times, allows to identify areas that have changed over time (see figure below). Typically a background image is constructed by fusing multiple consecutive video frames and all other frames are compared to this background, where differences between them can indicate changes due to motion, but also due to registration artifacts caused by motion blur [3].

Mosaicing

Typical cameras have a limited field-of-view to capture a scene, which is smaller than the one of the human eye. Using special lenses and apertures it is possible to significantly increase this viewing angle. In cases where the field-of-view is limited, e.g. a fixed lens is used, and yet a larger scene needs to be explored, multiple images can be stitched together to form a large mosaic image. Such a panorama image is constructed out of multiple overlapping small input images [4]. The following illustrates this fusion process.

Structure From Motion

During the imaging process the 3D nature of the real world scene is lost as the light rays are projected onto the 2D plane of the camera sensor. However, it is possible to recover the 3D scenery using multiple images recorded from different viewpoints. This can be achieved by moving a single camera to generate those different views, which is called structure from motion [5]. Another possible method is based on calculating the visual hulls of 3D objects from their 2D recordings [6].

Multi-Focus Image Fusion

Many cameras have a lens system that can only focus on either very close or very distant objects. After the image has been taken this point of focus can not be changed. In such case multiple input images of the same scene, each with a different focus in depth can be merged to compute an image with focused areas at all depth levels or to refocus at a different point [7]. In the following figure an example result generated by fusing two images is depicted.

Multi-Modal Image Fusion

In most cameras the image is generate from the read-out of a single sensor. However, often multiple sensors, which are sensitive to different wavelength, capture complementary information from the scene. For instance, to detect humans robustly and independent of illumination and temperature conditions, it is helpful to combine two images captured by two cameras, one sensitive to visible light and the other to infra-red light [8]. An example result of such fusion is depicted in the figure below.

References

[1] L. C. Pickup. Machine Learning in Multi-frame Image Super-resolution. PhD thesis, University of Oxford, February 2008.

[2] P.E. Debevec and J. Malik. Recovering high dynamic range radiance maps from photographs. Special Interest Group of Graphics, 1997.

[3] S. Ali and M. Shah. Cocoa - tracking in aerial imagery. International Conference on Computer Vision, 2005.

[4] R. Szeliski. Computer Vision: Algorithms and Applications. Springer, 2010.

[5] F. Dellaert, S. Seitz, C. Thorpe, and S. Thrun. Structure from motion without correspondence. Computer Vision and Pattern Recognition, 2000.

[6] Greg Slabaugh, Bruce Culbertson, Tom Malzbender, and Ron Schafer. A survey of methods for volumetric scene reconstruction from photographs. Eurographics Conference On Volume Graphics, 2001.

[7] D. Fedorov, B. Sumengen, and B. S. Manjunath. Multi-focus imaging using local focus estimation and mosaicking. International Conference on Image Processing, 2006.

[8] J. Davis and V. Sharma. Background-subtraction using contour-based fusion of thermal and visible imagery. Computer Vision and Image Understanding, 2007.

Last Updated on Sunday, 08 December 2013 15:53

Menu