A variational graph autoencoder for manipulation action recognition and prediction

Akyol, Gamze

A variational graph autoencoder for manipulation action recognition and prediction

dc.contributor.advisor	Sarıel, Sanem
dc.contributor.advisor	Aksoy, Eren Erdal
dc.contributor.author	Akyol, Gamze
dc.contributor.authorID	504181561
dc.contributor.department	Computer Engineering
dc.date.accessioned	2024-03-13T06:40:36Z
dc.date.available	2024-03-13T06:40:36Z
dc.date.issued	2022-06-23
dc.description	Thesis (M.Sc.) -- İstanbul Technical University, Graduate School, 2022
dc.description.abstract	Despite decades of research, understanding human manipulation actions has always been one of the most appealing and demanding study problems in computer vision and robotics. Recognition and prediction of observed human manipulation activities have their roots in, for instance, human-robot interaction and robot learning from demonstration applications. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. However, in order to process high-dimensional raw input, these networks must be immensely computationally complex. Thus, there is a need for huge amount of time and data for training these networks. Unlike previous research, in the context of this thesis, a deep graph autoencoder is used to simultaneously learn recognition and prediction of manipulation tasks from symbolic scene graphs, rather than using structured Euclidean data. The deep graph autoencoder model which is developed in this thesis needs less amount of time and data for training. The network features a two-branch variational autoencoder structure, one for recognizing the input graph type and the other for predicting future graphs. The proposed network takes as input a set of semantic graphs that represent the spatial relationships between subjects and objects in a scene. The reason of using scene graphs is their flexible structure and modeling capability of the environment. A label set reflecting the detected and predicted class types is produced by the network. Two seperate datasets are used for the experiments, which are MANIAC and MSRC-9. MANIAC dataset consists 8 different manipulation action classes (e.g. pushing, stirring etc.) from 15 different demonstrations. MSRC-9 consists 9 different hand-crafted classes (e.g. cow, bike etc.) for 240 real-world images. The reason for using such two distinct datasets is to measure the generalizability of the proposed network. On these datasets, the proposed new model is compared to various state-of-the-art methods and it is showed that the proposed model can achieve higher performance. The source code is also released https://github.com/gamzeakyol/GNet.
dc.description.degree	M.Sc.
dc.identifier.uri	http://hdl.handle.net/11527/24654
dc.language.iso	en_US
dc.publisher	Graduate School
dc.sdg.type	Goal 9: Industry, Innovation and Infrastructure
dc.subject	action
dc.subject	aksiyon
dc.subject	manipulaton
dc.subject	manipülasyon
dc.subject	autoencoders
dc.subject	otokodlayıcılar
dc.title	A variational graph autoencoder for manipulation action recognition and prediction
dc.title.alternative	Manipülasyon aksiyon tanıma ve tahminleme için değişimsel çizge otokodlayıcısı
dc.type	Master Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 504181561.pdf
Boyut:: 2.01 MB
Format:: Adobe Portable Document Format
Açıklama

İndir

Koleksiyonlar

LEE- Bilgisayar Mühendisliği-Yüksek Lisans