Assessing the impact of super-resolution on enhancing the spatial quality of historical aerial photographs

thumbnail.default.alt
Tarih
2024-06-10
Yazarlar
İncekara, Abdullah Harun
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
The level of distinguishability of details in an image is called resolution. In current studies, high-resolution (HR) images are generally preferred. However, not all available images may have resolution sufficient to fulfill their intended purpose. Due to hardware and cost constraints, it's not always feasible to obtain and prodecure HR images, hence low-resolution (LR) images need to be enhanced. This process is possible through techniques known as super-resolution (SR). SR is defined as obtaining an HR image from an LR one. It's accepted that an LR image is a degraded version of its HR counterpart. When detrimental effects are applied to an HR image, some information will be lost. Consequently, a lower-quality image will be obtained, which is referred to as LR. However, the image in need of enhancement is LR, while the unavailable image is HR. Therefore, transitioning from LR to HR is an inverse problem. To solve this problem, the lost information must be identified and restored to the LR image. In current SR studies, deep learning (DL) based models are now being utilized. Various network designs are employed to enhance model performance and achieve better image quality. These designs primarily include linear learning, residual learning, recursive learning, multi-scale learning, dense connections, generative adversarial networks, and attention mechanisms. DL-based SR studies initially began with the use of linear learning in the Super-Resolution Convolutional Neural Network (SRCNN) model. After linear learning, models utilizing residual learning with deeper networks and higher performance perspectives gained prominence. Due to the practical challenges posed by the increased number of parameters in deeper networks, recursive learning has been introduced in image processing studies. Recursive learning, based on the principle of parameter sharing to control the total number of parameters, allowed models to run much faster but introduced the vanishing gradient problem. In this context, dense- connected models incorporating both residual learning and recursive learning were proposed. Subsequently, visually high-quality images were obtained using generative adversarial network structures. Nowadays, there is a focus on attention mechanisms in SR studies. In summary, to improve model performance, learning strategies were altered, various loss functions were tested, and network architectures were modified with various hyperparameters. However, all efforts have been solely algorithm-based, and satisfactory results have actually been achieved, especially with attention mechanisms. One aspect that has not yet been fully addressed in SR studies is the impracticality of using deeper and more complex structures in real-time applications and the inability of models built on common datasets to deliver the expected performance in enhancing images for solving real-engineering problems. For the former, the performance rates of lightweight network architectures should be increased. For the latter, specific approaches tailored to solving the problem should be introduced. The remotely sensed (RS) images that have been scarcely evaluated in SR studies are historical aerial photographs (HAP). Besides the negative effects harbored during the enhancement of RS images, HAPs have additional constraints. Information losses during the conversion of printed copies to digital copies, data acquisition hardware used depending on the technological possibilities of the era, lack of spectral bands, and color information are the main negative constraints. Since HAPs play a crucial role in solving problems the present which is related to the past, they also need to be improved with SR techniques. In this thesis study, it is aimed to enhance the spatial quality of grayscale HAPs with DL-based SR model. In this context, approaches have been brought regarding the content and structure of the dataset. Orthophotos obtained from the General Directorate of Mapping of different years with different resolutions have been used as the primary data source. The acquired orthophotos belong to the years 1954 with a resolution of 30 cm, 1968 with resolutions of 40 cm and 70 cm, and 1982 with a resolution of 10 cm, and 1993 with a resolution of 40 cm. In the approach to dataset content, images of residential areas, farmland areas, forested areas, and bare land classes were extracted separately from orthophotos to create datasets. DL-based SR models cannot be directly used on HAPs because they are built on multi-spectral images. To overcome this limitation, artificial 3-band images were created by duplicating the same band twice. Although the single-band image is numerically converted to a three-band image, there is no change in content. To minimize this limitation, images of different resolutions from different years covering the same regions were used. This approach, which can be called imitating the multi-spectral image, did not include images containing only three different spectral bands in the training, but it seemed as if different spectral bands of the same image were included separately in the training. Another limitation is the lack of color information, which is due to the grayscale nature of the HAPs. The lack of color information for grayscale HAPs was minimized by using images with a wide range of intensities. Since different intensity values provide different grayscale tones, maximum use has been made of intensity values that provide differences for objects that are similar to each other both within the same category and across different categories. Another limitation for HAPs is that LR-HR image pairs are insufficient in content, which has been overcome by using larger size images. Depending on the years from which the data were obtained, there are a limited number of classes. During the convolution process, filters have been ensured to gather information on images containing more diversity in larger image sizes. The proposed approach for the dataset structure is based on the hierarchy of photo interpretation elements. The hierarchy of photo interpretation elements is expressed with different levels. The first level involves color and tone information, which are more pronounced in bare land and forest areas found in orthophotos. The second level includes size, shape, and texture. Residential areas represent the group that reflects these elements the most. The third level includes patterns, with farmland areas being the group that best reflects this element. Within this framework, the dataset is structured as the 1st level consisting of bare land and forest areas, the 2nd level consisting of residential areas, and the 3rd level consisting of farmland areas. The 1993 image was also used in the approach to the data set structure. Each of the three datasets were trained separately by means of SRCNN model. Two different methodologies were used to obtain the final image from separately trained data sets. The final image was created with the average of 3 different images improved in the first methodology. In the second methodology, each improved image was divided into pieces of equal size. A reference-free image quality metric was calculated for each part obtained. The final image was created by concatenating identical parts for which the quality metric gave better results. Approaches to both dataset content and dataset structure were evaluated with reference-based image quality metrics as well as visual interpretation. In the content-based approach, pixel-based metrics and structural similarity based metrics demonstrated positive progress. Evaluations made through visual interpretation also yielded consistent results with image quality metrics. This approach was also effective in reducing the softening effect on the output image. In the structural-based approach, creating the final image based on the reference-free image quality metric gave better results. However, the selectability of better image parts requires more advanced image processing techniques.
Açıklama
Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2024
Anahtar kelimeler
Aerial photographs, Hava fotoğrafları, Image processing, Görüntü işleme
Alıntı