A variational graph autoencoder for manipulation action recognition and prediction

dc.contributor.advisor Sarıel, Sanem
dc.contributor.advisor Aksoy, Eren Erdal
dc.contributor.author Akyol, Gamze
dc.contributor.authorID 504181561
dc.contributor.department Computer Engineering
dc.date.accessioned 2024-03-13T06:40:36Z
dc.date.available 2024-03-13T06:40:36Z
dc.date.issued 2022-06-23
dc.description Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2022
dc.description.abstract Despite decades of research, understanding human manipulation actions has always been one of the most appealing and demanding study problems in computer vision and robotics. Recognition and prediction of observed human manipulation activities have their roots in, for instance, human-robot interaction and robot learning from demonstration applications. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. However, in order to process high-dimensional raw input, these networks must be immensely computationally complex. Thus, there is a need for huge amount of time and data for training these networks. Unlike previous research, in the context of this thesis, a deep graph autoencoder is used to simultaneously learn recognition and prediction of manipulation tasks from symbolic scene graphs, rather than using structured Euclidean data. The deep graph autoencoder model which is developed in this thesis needs less amount of time and data for training. The network features a two-branch variational autoencoder structure, one for recognizing the input graph type and the other for predicting future graphs. The proposed network takes as input a set of semantic graphs that represent the spatial relationships between subjects and objects in a scene. The reason of using scene graphs is their flexible structure and modeling capability of the environment. A label set reflecting the detected and predicted class types is produced by the network. Two seperate datasets are used for the experiments, which are MANIAC and MSRC-9. MANIAC dataset consists 8 different manipulation action classes (e.g. pushing, stirring etc.) from 15 different demonstrations. MSRC-9 consists 9 different hand-crafted classes (e.g. cow, bike etc.) for 240 real-world images. The reason for using such two distinct datasets is to measure the generalizability of the proposed network. On these datasets, the proposed new model is compared to various state-of-the-art methods and it is showed that the proposed model can achieve higher performance. The source code is also released https://github.com/gamzeakyol/GNet.
dc.description.degree M.Sc.
dc.identifier.uri http://hdl.handle.net/11527/24654
dc.language.iso en_US
dc.publisher Graduate School
dc.sdg.type Goal 9: Industry, Innovation and Infrastructure
dc.subject action
dc.subject aksiyon
dc.subject manipulaton
dc.subject manipülasyon
dc.subject autoencoders
dc.subject otokodlayıcılar
dc.title A variational graph autoencoder for manipulation action recognition and prediction
dc.title.alternative Manipülasyon aksiyon tanıma ve tahminleme için değişimsel çizge otokodlayıcısı
dc.type Master Thesis
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
504181561.pdf
Boyut:
2.01 MB
Format:
Adobe Portable Document Format
Açıklama