Yüzdeki Nirengi Noktalarının Koşullu Regresyon Ormanları İle Saptanması

Vural, Gencer

Yüzdeki Nirengi Noktalarının Koşullu Regresyon Ormanları İle Saptanması

dc.contributor.advisor	Gökmen, Muhittin	tr_TR
dc.contributor.author	Vural, Gencer	tr_TR
dc.contributor.authorID	10077984	tr_TR
dc.contributor.department	Bilgisayar Mühendisliği	tr_TR
dc.contributor.department	Computer Engineering	en_US
dc.date	2015	tr_TR
dc.date.accessioned	2017-02-07T14:41:25Z
dc.date.available	2017-02-07T14:41:25Z
dc.date.issued	2015-06-25	tr_TR
dc.description	Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2015	tr_TR
dc.description	Thesis (M.Sc.) -- İstanbul Technical University, Instıtute of Science and Technology, 2015	en_US
dc.description.abstract	Teknolojinin gelişimine paralel olarak bilgisayarlar da gelişmektedir. Bilgisayarların gelişmesiyle beraber fiyatları da daha ucuz ve daha makul seviyelere gelmektedir. Bu sayede yaygınlaşan bilgisayarları kullanarak insan hayatını kolaylaştıracak sistemler oluşturma gayesi, bilgisayar bilimlerinde bir çok yeni araştırma alanının oluşumuna ve var olan bir çok alanın da hızla gelişmesine yol açmıştır. Bu alanlardan bir tanesi "Bilgisayarla Görü"dür. Bilgisayarla görü, eğlence ve endüstri gibi alanlardan, günümüzde insanlar için büyük önem arz eden sağlık ve güvenlik alanlarına kadar çok geniş bir kullanım alanı yelpazesine sahiptir. Bilgisayarla görü, bu alanlarda kullanılan, nesne tanıma, insan bilgisayar etkileşimi, yüz tanıma, yüz analizi gibi pek çok araştırma alanını bünyesinde barındırmaktadır. İnsan bilgisayar etkileşimiyle ve yüz analiziyle olan alakasının yanı sıra yüz tanımada ön adım olarak yüzdeki nirengi noktalarının saptanması da bilgisayarla görünün aktif alanlarından bir tanesidir. Orta ve yüksek kaliteli iki boyutlu durağan imgelerde yüzdeki nirengi noktalarının saptanması üzerine çok sayıda çalışma yapılmış olsa da, yüzdeki nirengi noktalarının düşük kaliteli imgelerde saptanması ve çoğu uygulama alanı için bir gereksinim olan gerçek zamanlı olarak saptanması bu alanın güncel problemlerindendir. Bu tez çalışmasında, bu güncel problemlerle ilgili olarak, yüzdeki 10 nirengi noktasının saptanması amaçlanmıştır. Bu noktalar; sol gözün solu, sol gözün sağı, sağ gözün solu, sağ gözün sağı, burnun sağı, burnun solu, ağzın sağı, ağzın solu, ağzın üstü ve ağzın altıdır. Bu çalışmada, bahsedilen nirengi noktalarının etkili ve başarılı bir şekilde saptanması için son yıllarda bilgisayarla görü problemlerinin çözümünde etkili ve çok yönlü bir araç olduğu görülmüş olan Regresyon Ormanları ile moment tabanlı bir öznitelik çıkarma yöntemi olan Yerel Zernike Momentlerini (YZM) içeren bir yapı kullanılmaktadır. Bu çalışmanın amaçlarından birisi de yüz tanımada etkili ve başarılı bir şekilde kullanılan YZM'nin yüzdeki nirengi noktalarının saptanmasında da başarılı bir şekilde kullanılabileceğinin gösterilmesidir. YZM'nin uygulandığı yamalar hem regresyon ormanlarının eğitiminde hem de yüzdeki nirengi noktalarının saptanması aşamasında kullanılmaktadır. Yüzdeki nirengi noktaları saptanırken, yamalar YZM uygulandıktan sonra karar ormanlarında değerlendirilmektedir. Regresyon ormanları ise görüntü yamaları ile yüzdeki nirengi noktaları arasındaki uzamsal ilişkiyi öğrenmektedir. Ayrıca, bu çalışmada farklı olarak regresyon ormanları, genelde olduğu gibi bütün eğitim kümesinden rastgele bir alt küme seçilerek eğitilmek yerine yüzün bakış açısına göre koşullu olarak eğitilmektedir. Bu sayede, yüzdeki bütün görünüm ve şekil değişimleri ile ilgilenmek yerine sadece ilgili bakış açısına ait olanları bilmeleri yeterli olduğu için ağaçların öğrenmesi daha kolay olmaktadır. Bu çalışmada kullanılan yöntem, temel olarak yüzün bakış açısının saptanmasında kullanılacak karar ormanının eğitimi, yüzdeki nirengi noktalarının saptanması için kullanılacak karar ormanlarının yüzlerin bakış açısına göre koşullu olarak eğitilmesi ve test aşamasında nirengi noktalarının yüzün bakış açısına uygun koşullu karar ağaçları kullanılarak saptanmasından oluşmaktadır. Bu tez çalışmasında kullanılan yöntemi test etmek amacıyla, 5749 kişinin 13233 yüz imgesinin bulunduğu ve çok çeşitli görüntüleme koşullarına sahip imgelerden oluşan Labeled Faces in the Wild (LFW) veri seti kullanılmıştır. Bu veri setinin seçilmiş olmasının sebebi, imgelerin farklı pozları, değişik ışıklandırma koşulları, çözünürlük ve imge kalitelerini, faklı yüz ifadelerini ve cinsiyetleri ve yüz bölümlerinin kapanması gibi saptama işlemini zorlaştırabilecek etkenleri barındırdığı zorlayıcı bir veritabanı olmasıdır. Bu çalışmada kullanılan yöntemin başarımı LFW veri setinde çeşitli hata toleransları belirlenerek değerlendirilmiştir. Tolerans, hata payının iki göz arası uzaklığa oranının alabileceği maksimum değerdir. YZM ile koşullu regresyon ormanlarının birlikte kullanıldığı bu yöntem kullanılarak %15 tolerans için yüzdeki nirengi noktaları ortalama %89.33 başarımla saptanmaktadır. Artan tolerans değerlerinde daha yüksek başarımlar elde edilmektedir. Aynı yöntemin %20 tolerans için yüzdeki nirengi noktalarını saptama başarımı ortalama %94.48 olmaktadır.	tr_TR
dc.description.abstract	In parallel with development of technology, computers are developing either. Computer prices are getting cheaper and more reasonable with the development of computers. The aim of creating systems which make people's lives easier by using computers becoming widespread thanks to reduction in their prices leads up to generation of many new research areas and rapid development of many existing areas. One of these areas is "Computer Vision". Computer vision has wide range of areas of usage from entertainment and industry to medical and security which are very important for people at the present time. Computer vision contains many fields of study such as object recognition, human computer interaction, face recognition and face analysis which are used in these areas of usage. Besides relationship with human computer interaction and face analysis, facial feature detection, as a pre-step of face recognition, is one of fresh fields of computer vision. Even though there are many studies on facial feature detection from two dimensional, high and medium quality, still images, facial feature detection from low quality still images and real-time facial feature detection are two of current problems in this field of study. Studies on facial feature detection can be classified into two categories depending on usage of holistic or local features. Holistic methods constructs a model by using entire face region, and images are tested by using this model. However, holistic methods are not able to deal with lighting changes and low resolution images accurately. In recent years, local methods, which are preferred and used mostly, uses image patches around facial landmarks while constructing a model. But these methods are not able to cope with global variations in the face due to being constructed by using limited face regions. Therefore global information about face like head pose is also used in local methods commonly in order to improve accuracy. In this thesis study, related to mentioned current problems, detection of 10 facial features is aimed. These facial features are left of left eye, right of left eye, left of right eye, right of right eye, right of nose, left of nose, right of mouth, left of mouth, upper of mouth and lower of mouth. In this study, a structure containing Regression Forests, which is proven as an effective and versatile tool for solving computer vision problems in recent years, and Local Zernike Moments (LZM), which is moment-based feature extraction method, is used in order to detect mentioned facial features effectively and successfully. One of aims of this thesis study is showing that LZM, is used for face recognition effectively and successfully, can also be used to detect facial features successfully. Patches on which LZM is applied are used for both training of regression forests and detection of facial features. After LZM is applied on patches, they are evaluated by decision forests during detection of facial features. Regression forests learn spatial relationship between facial features and image patches. Additionally, in this study, regression forests are trained conditionally to head pose instead of being trained by using a random subset of entire training set dissimilarly. Regression forests have to deal with variations in appearance and shape of only related head pose instead of entire set, and also trees are able to learn more easily thanks to this difference. Basically, the method which is used in this study consists of training decision forest for head pose detection, training decision forests conditionally to head pose for facial feature detection and detecting facial features by using suitable conditional decision trees depending on head pose. In this thesis study, regression trees which are used to detect facial features are chosen from entire trained regression trees depending on head pose. Therefore, head pose must be detected primarily. For this purpose, regression forest which detects the head pose among defined 5 head poses, 'left profile', 'left', 'front', 'right', 'right profile', is trained by using LFW dataset. Regression forest which is trained for head pose detection consist of 15 trees whose maximum depth are 10. During training of regression forests which are used to detect facial features, 13233 images, in which coordinates of 10 facial features are located, are used. 1500 images are randomly selected for training each regression tree. Faces in these images are detected. After face detection, images are rescaled in order to obtain 100x100 pixels face bounding boxes. Then, face bounding boxes are enlarged by %30 to make sure face bounding boxes contain all facial features. After defining face bounding boxes, 200 20x20 pixels patches, 150 from inside of boxes and 50 from rest area of the image, are randomly collected from each image. Finally, a regression tree whose maximum depth is 20 is constructed by applying training procedure mentioned in this study on collected patches. Similarly to training process of regression forests used for facial feature detection, face in the image is detected firstly during facial feature detection process. After face detection, image is rescaled in order to obtain 100x100 pixels face bounding box and then, face bounding box is enlarged by %30 to make sure face bounding box contains all facial features. 20x20 pixels patches, mostly from inside of box, are randomly collected from image. These patches are provided as input to regression trees chosen for facial feature detection depending on head pose of face. Patches are evaluated at each node and reach a leaf node at each tree. As a result of this evaluation, a set of leaf nodes is obtained for each patch. Facial features are tried to be predicted by using Gaussian kernel and density estimator on this set. Finally, facial features are detected by applying Mean-Shift on each one of predicted facial feature points. Labeled Faces in the Wild (LFW) dataset, which contains 13233 face image of 5749 people and large variations in imaging conditions, is used for testing the method used in this thesis study. This challenging dataset has been chosen because it includes variations in pose, lighting conditions, resolution, quality, facial expressions and gender, and challenging factors such as occlusion of facial area. Success of the method used in this study is evaluated in LFW dataset depending on various tolerance values. Tolerance is the maximum value of ratio of error margin to inter-ocular distance. When tolerance is defined as %15, %89.33 average success rate is obtained by using this method consisting a combined structure of LZM and conditional regression forests. If greater tolerance values are selected, greater success rates are observed. %94.48 average success rate is measured by using same method when tolerance is defined as %20. Speed of the method used in this thesis study is also tested. If location of face is provided to the system as input, average duration of facial feature detection is measured as 56 milliseconds. If the system in this study also detects face firstly, average duration of facial feature detection is measured as 72 milliseconds. If the results obtained in this study are examined, it is seen that these results which are obtained from a challenging dataset including variations in pose, lightning conditions, resolution, quality, facial expressions and gender are promising results.	en_US
dc.description.degree	Yüksek Lisans	tr_TR
dc.description.degree	M.Sc.	en_US
dc.identifier.uri	http://hdl.handle.net/11527/12969
dc.publisher	Fen Bilimleri Enstitüsü	tr_TR
dc.publisher	Instıtute of Science and Technology	en_US
dc.rights	İTÜ tezleri telif hakkı ile korunmaktadır. Bunlar, bu kaynak üzerinden herhangi bir amaçla görüntülenebilir, ancak yazılı izin alınmadan herhangi bir biçimde yeniden oluşturulması veya dağıtılması yasaklanmıştır.	tr_TR
dc.rights	İTÜ theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.	en_US
dc.subject	Görüntü İşleme	tr_TR
dc.subject	Bilgisayarla Görü	tr_TR
dc.subject	Yerel Zernike Momentleri	tr_TR
dc.subject	Nirengi Noktalarının Saptanması	tr_TR
dc.subject	Image Processing	en_US
dc.subject	Computer Vision	en_US
dc.subject	Local Zernike Moments	en_US
dc.subject	Facial Feature Detection	en_US
dc.title	Yüzdeki Nirengi Noktalarının Koşullu Regresyon Ormanları İle Saptanması	tr_TR
dc.title.alternative	Facial Feature Detection Using Conditional Regression Forests	en_US
dc.type	Master Thesis	en_US

Koleksiyonlar

FBE- Bilgisayar Mühendisliği Lisansüstü Programı - Yüksek Lisans

Yüzdeki Nirengi Noktalarının Koşullu Regresyon Ormanları İle Saptanması

Dosyalar

Koleksiyonlar