LEE- Bilgisayar Mühendisliği-Yüksek Lisans

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19207

Gözat

Fight recognition from still images in the wild

(Graduate School, 2022-06-22) Aktı, Şeymanur ; Ekenel, Hazım Kemal ; 504191539 ; Computer Engineering

Violence in general is a sensitive subject and can have a negative impact on both the involved people and witnesses. Fighting is one of the most common types of violence which can be defined as an act where individuals intend to harm each other physically. In daily life, these kinds of situations might not be faced too often, however, the violent content on social media is also a big concern for the users. Since violent acts or fights in particular are considered as an anomaly or intriguing for some, people tend to record these scenes and upload them on their social media accounts. Similarly, news agencies also regard them as newsworthy material in some cases. As a result, fighting scenes become available on social media platforms frequently. Some users may be sensitive to these kinds of media content and children who can be harmed due to the aggressive nature of the fight scenes also uses social media. These facts make it necessary to detect and put limitations on the distribution of violent content on social media. There are some systems focusing on violence and fight recognition on visual data. However, these works mostly propose methods on different domains for violence such as movies, surveillance cameras, etc., and the social media case remains unexplored. Furthermore, even if most of the fight scenes shared on social media are in video sequences, there is also a non-ignorable amount of image data depicting violent fighting. However, no work tackles the fight recognition from still images instead of videos. Thus, in this thesis, the problem of fight recognition from still images is investigated. In this scope, first, a novel dataset was collected from social media images which is named Social Media Fight Images (SMFI). The dataset was collected from Twitter and Google images and some frames were included from the video dataset of NTU CCTV-Fights. The fight samples were chosen among the samples which are recorded in uncontrolled environments. In order to crawl a large amount of data, different keywords were used in various languages. The non-fight samples were also chosen among the data crawled from social media in order to keep the domain consistent across the classes. The dataset is made publicly available by sharing the links to the images. For the classification of the Social Media Fight Images dataset, some image classification methods were applied to the dataset. First, Convolutional Neural Networks (CNN) were employed for the task and their performance was assessed. Then, a recent approach, Vision Transformer (ViT) was exploited for the classification of the fight and non-fight images. The comparison showed that the Vision Transformer gives better results on the dataset achieving a higher accuracy with less overfit. A further experiment was also held on investigating the effect of varying dataset sizes on the performance of the model. This was seen as necessary as the data shared on social media may be deleted in the future and it is not always possible to retrieve the whole dataset. So, the model was trained on different partitions of the dataset and the results showed that even if using more data is better, the model could still give satisfying performance even in absence of 60% of the dataset. Upon the successful results on fight recognition on still images problem, another experimental study was conducted on the classification of video-based datasets using a single frame from each sample. The experiment included four video-based fight datasets and results showed that three of them could be successfully classified without using any temporal information. This indicated that there might be a dataset bias for these three datasets where the inter-class visual difference is high across the classes. Cross-dataset experiments also supported this hypothesis where the trained models on these video datasets perform poorly on the other fight recognition datasets. Nonetheless, the network trained on the proposed SMFI dataset gave a promising accuracy on other datasets as well, showing that the dataset generalizes the fight recognition problem better than the others.
Occlusion robust and aware face recognition

(Graduate School, 2023-05-25) Erakın, Mustafa Ekrem ; Ekenel, Hazım Kemal ; 504201532 ; Computer Engineering

Occluded faces, due to accessories such as sunglasses and face masks, present a challenge for current face recognition systems. This thesis provides a comprehensive exploration of the issues caused by occlusions, particularly upper-face and lower-face obstructions, in real-world scenarios. The increased prevalence of sunglasses and face masks, the latter due to the COVID-19 pandemic, has amplified the importance of addressing these problems. In this thesis, the Real World Occluded Faces (ROF) dataset is gathered, a collection of faces experiencing both upper and lower face occlusions, serving as a critical resource for this area of study. Contrary to synthetic occlusion data, the ROF dataset provides an authentic representation of the issue, which our benchmark experiments have shown to be a significant impediment for even the most sophisticated deep face representation models. These models, while highly effective on synthetically occluded faces, exhibit substantial performance degradation when tested against the ROF dataset. This research comprises two distinct, yet interconnected sections. The first stresses the vital role of real-world data for the design and refinement of occlusion-robust face recognition models. Our experiments demonstrate the increased challenges posed by real-world occlusions in comparison to their synthetic counterparts. This insight allows us to gauge the performance and limitations of various model architectures under different occlusion conditions. The second section presents a novel, occlusion-robust, and occlusion-aware face recognition system, designed to increase performance on occlusions caused by sunglasses and masks, with minimal impact on generic face recognition performance. The system incorporates an occlusion-robust face recognition model, an occlusion-aware model, and an innovative layer integrating the outputs of these models to minimize occlusion effects. This unique configuration ensures the system's resilience to occlusions, focusing less on occluded regions and more on overall facial recognition. This thesis provides a thorough investigation of the challenges presented by occluded face recognition and proposes an innovative solution for the same. It underscores the necessity of utilizing real-world data for developing robust face recognition models and introduces a novel occlusion-aware face recognition system. This work has the potential to significantly enhance the performance of occluded face recognition methods in various real-world scenarios.

Gözat

Yazar "Ekenel, Hazım Kemal" ile LEE- Bilgisayar Mühendisliği-Yüksek Lisans'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri