UGQE : uncertainty guided query expansion in image retrieval

thumbnail.default.alt
Tarih
2022
Yazarlar
Öncel, Fırat
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Graduate School
Özet
Image Retrieval is one of the important subproblems in Computer Vision domain. A typical image retrieval pipeline consists of a feature extractor and a search operation in image database with a given similarity measure. With the dominance of deep learning, hand-crafted feature extraction techniques are replaced with Convolutional Neural Network (CNN) based feature extractors. Images are represented with those extracted features. Sometimes, when a query is made in an image database, some of the retrieved images may be irrelevant to the query image. Those images should be eliminated in order to improve the performance of the image retrieval systems. Query expansion is one of the ways to perform that operation. Query expansion can be considered as making a second search after the retrieved images of the first search are aggregated with the query image. The aggregation can be done in several ways such as taking an average or a weighted average. However, classical query expansion techniques have some drawbacks such as indistinctness between relevant and irrelevant neighbors or monotonic weight assignments. Existing approaches in query expansion did not consider reliability of neighbors in selecting and executing the expansion operation. Reliability per se is not straightforward to measure, however, it can be estimated as inversely proportional to the amount of uncertainty inherent in the neighbor selection. With the advent of neural network based function approximators, an uncertainty quantification can be integrated into standard neural networks that adds an ability of saying "I do not know" or "I am not certain" about this outcome. In this thesis we integrate a pair-wise uncertainty quantification into the query expansion process in order to generate new features via a novel Uncertainty Guided Transformer Encoders (UGTE) method. Those newly generated features are concatenated with original features to enrich the overall feature representations. Then those feature representations are fed into the Learnable Attention Based Transformer Encoders (LABTE) to assign weights to neighbors. Our method consists of UGTE and LABTE: first we generate new features with UGTE, then assign new weights to the neighbors with LABTE. Experimental results show that our proposed method increases the performance of the system relative to the baseline method which consists of only the LABTE framework, over standard image retrieval benchmarks. We utilize a CNN feature extractor, which is trained on Google Landmarks dataset. To extract the features of the transformer encoder, the train dataset that is utilized is rSfM120k, while the method is tested with datasets: rOxford5k, rParis6k and 1 M Distractors.
Açıklama
Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2022
Anahtar kelimeler
Convolutional neural networks, Query expansion, Image retrieval
Alıntı