Single-frame and multi-frame super-resolution on remote sensing images via deep learning approaches
Single-frame and multi-frame super-resolution on remote sensing images via deep learning approaches
dc.contributor.advisor | Sertel, Elif | |
dc.contributor.author | Wang, Peijuan | |
dc.contributor.authorID | 705172003 | |
dc.contributor.department | Satellite Communication and Remote Sensing | |
dc.date.accessioned | 2024-08-14T09:14:34Z | |
dc.date.available | 2024-08-14T09:14:34Z | |
dc.date.issued | 2022-07-29 | |
dc.description | Thesis(Ph.D.) -- Istanbul Technical University, Graduate School, 2022 | |
dc.description.abstract | As a quite significant computer vision task, image super-resolution (SR) has been widely applied in remote sensing (RS), medical imaging, video surveillance, and biometrics. Image SR aims to restore high-resolution (HR) images by enhancing the spatial, spectral, or temporal resolution of the low-resolution (LR) inputs. In recent years, great efforts have been made for improving the SR approaches. One of the approaches is to design deeper networks. Nevertheless, this greatly increases computation and memory consumption. As a result, some mechanisms (such as cascading networks, attention mechanisms, and back projection) are proposed to improve the performance and the training process of the complex networks. Satellite imagery can be seen in various fields, namely Land cover/Land use classification, road and building extraction, observation of climate, and earthquake prediction. However, in some cases, the resolution of satellite images can not meet the application requirements due to the technology and cost limitations during the satellite design; therefore, the improvement of image resolution might be necessary. Since it is not possible to upgrade the equipment onboard the launched satellite, the software-based SR algorithms are deserved to be explored in RS fields. This thesis aims to strengthen the spatial resolution of optical satellite imageries by using deep learning (DL) methods. Generally, image SR algorithms can be categorized as single-frame image SR (SFSR) and multi-frame image SR (MFSR). The inputs of the SFSR can be a single LR image. While multi-frame image SR aims to restore HR image by using multiple LR images, which can be obtained under different conditions and at different angles. Recently, great contributions have been made to improve the SR methods including two aspects: (1) increasing the value of PSNR (Peak-Signal-Noise-Ratio); (2) improving the image quality perceptually. Nevertheless, some algorithms obtain a high PSNR but with a low perceptual quality which is more important to human perception. Therefore, this thesis has the following objectives: (1) Explore a perceptual-driven approach to enhance the SR image quality visually on single-frame and multi-frame RS imageries; (2) Explore Generative Adversarial Network (GAN)-based models for single-frame and multi-frame RS imagery super-resolution task to fulfill the multi-scale problem and blind to the degradation model; (3) Explore an image fusion method that can generate an arbitrary size of the super-resolved image rather than a small patch. This thesis firstly gives an overview of single-frame and multi-frame RS image SR methods. The single-frame RS image SR methods are briefly classified into supervised and unsupervised methods. The former mainly includes Convolutional Neural Networks (CNN)-based, GAN-based, attention-based, and Back-projection based methods. In addition, the commonly used attention mechanisms including self-attention, channel attention, spatial attention, mixed high-order attention (MHOA), non-local attention (NLA), and non-local sparse attention (NLSA) are also introduced. Moreover, loss functions including pixel-wise loss, perceptual loss, adversarial loss, and cycle consistency loss are presented. For the single-frame RSISR, firstly, an attention CNN-based SR method is proposed. Although CNN-based algorithms have made outstanding achievements in computer vision tasks, the traditional CNNs methods treat the abundant low-frequency information included in the LR inputs equally across channels. Attention-guided algorithms play a vital role in the informative features extraction in various tasks including image SR. With the application of the attention mechanism, the proposed CNN-based method can further learn the deeper relationships among the different channels. Instead of simply integrating the attention module with the residual blocks, a Layer Attention Module (LAM) and Spatial Attention Module (SAM) are proposed to further learn the relationships among the Residual Groups (RG). Moreover, the perceptual loss function is adopted in the training process to enhance the generated image quality perceptually, and Random down-sampling is applied to strengthen the model's generalization ability. Secondly, an attention GAN-based super-resolution method is explored for the single-frame RS images. CNN-based methods have made great contributions to increasing the value of PSNR/SSIM. Nevertheless, the generated outputs tend to be overly smooth and blurry. GAN can generate more realistic images than normal CNN-based methods and has been introduced to single image super-resolution (SRGAN, ESRGAN, EEGAN). Standard GANs only function on spatially local points in LR feature maps. The attention mechanism can directly learn the long-range dependencies in the feature maps both in the generator and discriminator in a powerful way. By applying the attention mechanism, the network allocates attention based on the similarity of color and texture. Therefore, based on ESRGAN, an attention GAN-based method is for the single-frame RS image SR. The ESRGAN was mainly improved from two aspects: (1) we further improved the architecture of the residual blocks by adding more skip connections; (2) we add attention modules to the residual blocks for further feature extraction. Moreover, instead of working on aerial photographs or low-resolution and medium-resolution satellite images, we are focusing on the Very High-Resolution (VHR) satellite imageries, such as the Pleiades, and Worldview-3. The spatial resolutions of the multispectral images for the Pleiades, and Worldview-3 are 2m, and 1.24m, respectively. Furthermore, for the attention CNN-based method, we evaluated the method on the Pleiades and Worldview-3 datasets with scaling factors of 2, 4, and 8. For the attention GAN-based method, we evaluated the method on the Pleiades and Worldview-3 datasets with a scaling factor of 4. The experimental results show the attention-based method can provide better perceptual results both quantitatively and qualitatively. At last, we proposed an attention GAN-based method for the multi-frame RS image SR. Firstly, we introduced an attention mechanism to the Generator and proposed a space-based network that worked on every single frame for better temporal information extraction. Secondly, we proposed a novel attention module for better spatial and spectral information extraction. Thirdly, we applied an attention-based discriminator to enhance the discriminator's discriminative ability. Finally, the experimental results on the SpaceNet7 dataset and Jilin-1 dataset exhibit the superior of the proposed model both quantitatively and qualitatively. | |
dc.description.degree | Ph. D. | |
dc.identifier.uri | http://hdl.handle.net/11527/25146 | |
dc.language.iso | en_US | |
dc.publisher | Graduate School | |
dc.sdg.type | Goal 9: Industry, Innovation and Infrastructure | |
dc.subject | deep learning | |
dc.subject | derin öğrenme | |
dc.subject | satellite images | |
dc.subject | uydu görüntüleri | |
dc.subject | remote sensing | |
dc.subject | uzaktan algılama | |
dc.title | Single-frame and multi-frame super-resolution on remote sensing images via deep learning approaches | |
dc.title.alternative | Derin öğrenme yaklaşımlarıyla uzaktan algılama görüntülerinde tek çerçeve ve çok çerçeve süper çözünürlük | |
dc.type | Doctoral Thesis |