Transfer learning based super resolution of aerial images and the effects of the super resolution on object detection
Transfer learning based super resolution of aerial images and the effects of the super resolution on object detection
Dosyalar
Tarih
2023-02-01
Yazarlar
Haykır, Aslan Ahmet
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Data quality and resolution are important properties for computer vision tasks. Super-resolution is a powerful way to increase the image resolution with low reconstruction error and super-resolved images contain more information about the original scene compared to their low-resolution counterparts. In this thesis, we worked on training super-resolution models for aerial images using the transfer learning technique. Furthermore, we used the models to analyze the effects of super-resolution on object detection for aerial images. Super-resolved images have more detailed information about an original scene when we compare them to their low-resolution versions. So, the Super Resolution can be used as a preprocessing method for computer vision tasks. Super-resolved images are more meaningful to humans. They can be generated using an original low-resolution image. They are beneficial on computer vision tasks such as medical image processing, pattern recognition, object detection, and so on. In this thesis, our aim is to apply the Super-Resolution method to Aerial Images to generate more information using their low-resolution pairs. Super Resolution Generative Adversarial Network (SRGAN) is a powerful way to achieve this task and we utilized it in this work. The SRGAN is a generative modeling methodology. It is based on Generative Adversarial Networks. We used the DIV2K dataset to train a base model. After that, we utilized the transfer learning technique to train separate models on other aerial image datasets, the xView, and the DOTA. These datasets contain aerial images captured by satellites. We applied the transfer learning technique using the base model trained with the DIV2K dataset to achieve better image quality for the aerial images. The widely known metrics like PSNR, SSIM, and so on have some problems when measuring perceptual quality. To address this problem, we used the Perceptual Index (PI). Also, the Root Mean Squared Error (RMSE) is used to measure how close the generated images are to their original versions. Both metrics are used at European Conference on Computer Vision. In 2018, these metrics were used for Perceptual Image Restoration and Manipulation Workshop. We have observed that the pre-trained model on the xView and the DOTA results in better perceptual quality images. And, the transfer learning model trained using the base model results in better reconstruction quality images. The perceptual quality is measured using the PI and the reconstruction quality is measured using the RMSE. In the end, we had one base model trained on the DIV2K dataset, one trained on the xView dataset using transfer learning, and one trained on the DOTA dataset using transfer learning. We applied object detection for all of the models' results. We used the MAP metric for the object detection evaluation. We achieved the best object detection results with the models which have the best RMSE scores. The models which generate images more meaningful to human perception was less effective on the object detection quality. Both models resulted better compared to naive up-sampling methods like bicubic interpolation. We conclude that the images which have better perceptual quality are more meaningful to human perception but the images with low reconstruction error are more meaningful to computers. This indicates that computers see differently compared the humans. One key limitation of this work was the limited architectures we used for achieving high-resolution images. Better results can be achieved with further training. Also, applying end-to-end training, to train the SRGAN to improve object detection accuracy can be future work to achieve a better mAP.
Açıklama
Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2023
Anahtar kelimeler
Aerial photographs,
Hava fotoğrafları,
Image processing,
Görüntü işleme,
Machine learning,
Makine öğrenmesi