Yüz Veritabanı Uygulaması Yapılandırması Ve Yüz Özniteliklerinde Öğrenme Yöntemlerinin Veritabanına Uygulanması
Yüz Veritabanı Uygulaması Yapılandırması Ve Yüz Özniteliklerinde Öğrenme Yöntemlerinin Veritabanına Uygulanması
Dosyalar
Tarih
2014-10-27
Yazarlar
Kozan, Gizem İrem
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Institute of Science and Technology
Institute of Science and Technology
Özet
Kişi tanıma uygulamaları, insanların ölçülebilir biyolojik özelliklerinden faydalanırlar. Bu ölçülebilir özellikler biyometrik özellikler olarak adlandırılırlar. Biyometrik özelliklere parmak izi, retina, göz bebeği, yüz, avuç içi izi, el yazısı, DNA ve ses örnek verilebilir. Kişi tanıma uygulamaları, erişim kontrollerinde kişinin kimliğini tespit etmek ve doğrulamak gibi amaçlarla kullanılırlar. Kişilerin kimliğini biyometrik özellikleriyle doğru tespit edebilmek için biyometrik özelliklerin tutulduğu veritabanları ile karşılaştırma yapılır. Biyometrik özelliklerin tutulduğu veritabanında istatistiksel yöntemlerle bir hesaplama yapılarak en çok eşleşen kişi belirlenir. Yapılan kişi tanıma uygulamalarında uygulamanın çalıştığı veritabanı büyüklüğünün ve çeşitliliğinin performans açısından önem taşıdığı görülmüştür. Performans metriklerini doğru sonucun alınması (doğruluk), sonuç alınma süresi (hız), yöntem kolaylığı olarak ifade edebiliriz. Eğitim kümesinin veritabanındaki daha çok veriyi kapsaması doğruluğu arttırırken, daha çok işlem yapılacağı için daha hızlı sonuç veren eğitim algoritmalarının kullanılmasını gerektirir. Eğitim kümesinin galeriye göre küçük olması doğruluğu azaltacaktır. Kullanılacak yöntemin veritabanı ve eğitim kümesi büyüklüğüne göre uygunluğunu değerlendirilmelidir. Bu tezde yapılan çalışmada yüz biyometrisi kullanılmış ve buna ilişkin bir veritabanı oluşturulmuştur, geleneksel yüz tanıma yöntemleri araştırılmıştır. Yöntemlerin uygulanış biçimleri değerlendirilmiştir. Yüz tanımada, eldeki bir kayıt üzerinde yüz öznitelikleri, şablon, üç boyutlu maske veya ten dokusu gibi yöntemler kullanılır ve yüz doğrulamada kişinin yüz veritabanınındaki bir yüz kaydıyla eşleşmesi sağlanır. Yüz tanıma ve doğrulama amaçlı kullanılan kayıtlara video karesi, fotoğraf, portre çizimi örnek verilebilir. Yüz tanıma, kolay elde edilebilirlik açısından işlevseldir. Topluma açık alanlarda izleme kameraları tarafından tutulan kayıtlar bu amaçla kullanılırlar, ancak izin alınmaksızın yapılan kayıtlar, kişilerin gizliliklerine müdahale olarak nitelendirilebilir. Bu yüzden tez çalışmasında kullanılan yüz kayıtlarının paylaşıma ve kullanıma açık olduğu teyit edilmiş ve gerekli izinler sahiplerinden alınmıştır. Yüz açısı, ortam aydınlatması, gözlük takılması, yüzü kapatan saç stilleri gibi birçok etkene bağlı olarak diğer biyometrik özelliklerle kıyasla yüz biyometrik özelliği için işlevsellikte düşüş yaşanabilmektedir. İnsanların yüz tanımasıyla otomatik uygulamaların yaptığı yüz eşleştirmelerinin sonuçları bu gibi sebeplerle farklı çıkmaktadır. TÜBİTAK'ın desteklediği 112E142 nolu Karikatür Yapım Kurallarından Yararlanarak Yüz Tanımaya Yönelik Çizim Desen ile Fotoğraf Çakıştırması Projesi'nin bir parçası olarak yapılan bu tez çalışmasında yüz özniteliklerinin incelenmesi için oluşturulan örnek yüz veritabanında yüz öznitelikleri sınıflandırılmıştır. Yüz veritabanında fotoğraf ve karikatür çiftleri tutulmaktadır. Gönüllü katılımcılarla yapılan web tabanlı oylama sistemi üzerinden her yüz kaydı için veri toplanmıştır. Oylama sistemi için altyapı oluşturulmuş ve İTÜ sunucuları kullanılmıştır. Toplanan verilere öğrenme yöntemleri uygulanarak önemli yüz öznitelikleri tespit edilmeye çalışılmıştır. Oluşturulan karikatür-fotoğraf veritabanının kullanışlılığından ve toplanan verilerin istatistiksel işleme uygunluğu değerlendirilmiştir. İleride yapılabilecek çalışmalar ve veritabanının kullanılabileceği alanlar hakkında bilgi verilmiştir.
Personal identification consists of measurable biological and behavioural characteristics of humans. These measurable characteristics are named as biometric properties. Fingerprint, retina, eye pupil, face, palmprint, handwriting, DNA and voice are the most common examples of biometric properties. An identification application uses biometric properties to identify the right person and to authorize access. To identify the right person, the application makes comparison against a database that stores biometric data given the input data. After applying various methods and several statistical calculations on database, the application matches the most appropriate person as the identified person. For this thesis, generel research and literature survey have been conducted to have a general knowledge of the identification process. The rate of success is discussed in terms of performance characteristics such as consistency, speed and method availability. Consistency performance is a lookup characteristic to an application is able to identify the right person, which can be measured in several ways. It is mostly used when comparing two methods with each other for the same biometric property. Speed is another performance characteristic which refers to the elapsed time to identify the person. Accuracy mostly depends on the size of the traning set and gallery. Method needs to be chosen to match characteristics of biometric database. Biometric database size and variance are important when performance of an identification process is being discussed. As the number of people on earth increases, these databases keeping biometric data get bigger. To handle this growth, identification processes need to handle big data to the performance requirements like consistency and speed. New methods for identification have been developed in order to catch up with the amount of data. For this thesis, face biometry was used for personal identification and a face database was formed. Traditional methods and algorithms for face recognition were researched and evaluated. Example databases were reached by getting permission from admins and were looked up for reference. For the use of records in face database, permissions were granted by the owners which made this work to be published safely. Although face recognition seems to be less functional when the face records are affected by environmental issues like camera angle, background lightening and occlusions due to wearing glasses and hair covering face; it is a commonly used biometric in applications. xxii Normalization procedures, rotations were not discussed in this thesis but their effects were also a subject mentioned when selecting the appropriate face records to put on the face database. Terms of being recognizable were discussed. This thesis was conducted as a part of TUBITAK Project 112E142 to match drawing, texture and photograph by using caricature rules towards face recognition. Face features were extracted to classify the categories to collect data fairly and to make the features as independent as they can be. These categories were formed in the face database to collect data in regarding fields. Volunteers were accepted for the survey to collect face feature data. All volunteers are shown face records form database to select best fitting face properties in each category presented on the poll. To make the survey available on the internet, several systems were designed and implemented. One system was made to run on a cloud system to be effective as it would be high available to reach and cost less. A second system was made to run as a virtual machine formed on Linux operating system for showcases. To make it more private and protected by a website and a database on Istanbul Technical University servers, were formed for public use and entrance to the voting application. Mechanisms were formed to reject illegal access to database and intrusion to table data to be used for face recognition. Then, poll results were applied to database for each face record. Results were gathered by using optional outputs according to user choice in terms of people conducted survey, specific face feature, number of votes. This database and voting results may also be used for future works and different implementations which is mentioned at the last part of this thesis. Future studies and relevant works were mentioned and several examples were given. There were many challenges during the study of this thesis. Most of face recognition algorithms have been developed since 1990s and many statistical applications have been used, there are many mathematical expressions to be analysed to find a better solution to the problem. Human perception was another challenge in this thesis, as many volunteers decided to vote on different face features for the same face record. A process was needed to normalize and decide which face feature to be taken as valuable data and to be ignored as not useful data. To resolve this problem, a threshold value for each category was defined. By definition, the face feature which was voted by majority (more than %50) is defined as voted, whereas other face features in the same feature category were ignored. As the formed face database involved many photos and caricatures, the voting process limited the volunteers to vote mostly from the beginning of database sorted by face id. As volunteers mostly seemed to leave at the half of the voting process, rest of the database never get voted. To resolve this problem, face ids were presented randomly to the volunteers to get every face id to be voted once at least. An example dataset, including 25 photos and 25 caricatures for same person, was formed to get a quick view of results by learning methods. This set was voted by volunteers and sum of votes for each face id in the example dataset was calculated to see if enough voting was done. Another request from volunteers was to reduce the time elapsed during voting process which usually took three to five minutes which made the voting process on the example dataset to last about 4 hours per volunteer. It was not an easy task to reduce the time elapsed for voting process. Another problem was seen when some voters gave empty answers to most of the categories presented in the poll, which significantly dropped the valuable data to be taken for analysis. Other than asking people to fill in all categories, empty votes were not taken into decision during data verification. Empty votes were cleaned up manually. Voting process was designed to collect data but not to output any message to volunteers, yet many volunteers asked for the results to be seen after voting, to see whether they had given the same vote with the others. It was not a goal for this thesis or project, yet it was asked to be made attractive to get more volunteers. Intended way was not to ask user to choose something taking other volunteers' perception as reference or to pay for each survey conducted. Some polling sites were researched to find a way to attract more volunteers. Payment was not an option as no funding was granted for voting process at the beginning. To attract more volunteers, a new application running on Android platforms like smart phones and tablets has been developed. As this research is not intended for a commercial product and many users of these platforms choose to download free applications, it is seen as an opportunity to develop poll application on these platforms to reach greater audience. To protect the privacy of the voters, no personal data was asked from volunteers for information security. Even their passwords were kept as hash in the database so that unintended users are kept away. A common protection tool, the captcha system was added when users were added. This also caused some problem as ITU servers were not accepting another verification tool from unknown sources to send information and a code was written to generate unique figure as captcha involving dots and lines to confuse webbots to access database. The website was designed independently from the size of the face database. When a new face feature or a new face feature category was to be added to the database, the website does not need to be changed accordingly to reflect changes. Poll table in face database recorded date of the vote and username of the voter which gave flexibility to choose which data to be collected for statistical methods. Select queries could be used to gather data for specific users or between selected time interval. Various concepts for statistical analysis and data mining methods are being developed by researchers nowadays. In future, when human perception of faces and mathematical algoritms for face recognition give similar results; this problem of face recognition is going to be solved with the concept studies like this thesis. The face database formed in this thesis could be referenced for future related studies and intended for academic use.
Personal identification consists of measurable biological and behavioural characteristics of humans. These measurable characteristics are named as biometric properties. Fingerprint, retina, eye pupil, face, palmprint, handwriting, DNA and voice are the most common examples of biometric properties. An identification application uses biometric properties to identify the right person and to authorize access. To identify the right person, the application makes comparison against a database that stores biometric data given the input data. After applying various methods and several statistical calculations on database, the application matches the most appropriate person as the identified person. For this thesis, generel research and literature survey have been conducted to have a general knowledge of the identification process. The rate of success is discussed in terms of performance characteristics such as consistency, speed and method availability. Consistency performance is a lookup characteristic to an application is able to identify the right person, which can be measured in several ways. It is mostly used when comparing two methods with each other for the same biometric property. Speed is another performance characteristic which refers to the elapsed time to identify the person. Accuracy mostly depends on the size of the traning set and gallery. Method needs to be chosen to match characteristics of biometric database. Biometric database size and variance are important when performance of an identification process is being discussed. As the number of people on earth increases, these databases keeping biometric data get bigger. To handle this growth, identification processes need to handle big data to the performance requirements like consistency and speed. New methods for identification have been developed in order to catch up with the amount of data. For this thesis, face biometry was used for personal identification and a face database was formed. Traditional methods and algorithms for face recognition were researched and evaluated. Example databases were reached by getting permission from admins and were looked up for reference. For the use of records in face database, permissions were granted by the owners which made this work to be published safely. Although face recognition seems to be less functional when the face records are affected by environmental issues like camera angle, background lightening and occlusions due to wearing glasses and hair covering face; it is a commonly used biometric in applications. xxii Normalization procedures, rotations were not discussed in this thesis but their effects were also a subject mentioned when selecting the appropriate face records to put on the face database. Terms of being recognizable were discussed. This thesis was conducted as a part of TUBITAK Project 112E142 to match drawing, texture and photograph by using caricature rules towards face recognition. Face features were extracted to classify the categories to collect data fairly and to make the features as independent as they can be. These categories were formed in the face database to collect data in regarding fields. Volunteers were accepted for the survey to collect face feature data. All volunteers are shown face records form database to select best fitting face properties in each category presented on the poll. To make the survey available on the internet, several systems were designed and implemented. One system was made to run on a cloud system to be effective as it would be high available to reach and cost less. A second system was made to run as a virtual machine formed on Linux operating system for showcases. To make it more private and protected by a website and a database on Istanbul Technical University servers, were formed for public use and entrance to the voting application. Mechanisms were formed to reject illegal access to database and intrusion to table data to be used for face recognition. Then, poll results were applied to database for each face record. Results were gathered by using optional outputs according to user choice in terms of people conducted survey, specific face feature, number of votes. This database and voting results may also be used for future works and different implementations which is mentioned at the last part of this thesis. Future studies and relevant works were mentioned and several examples were given. There were many challenges during the study of this thesis. Most of face recognition algorithms have been developed since 1990s and many statistical applications have been used, there are many mathematical expressions to be analysed to find a better solution to the problem. Human perception was another challenge in this thesis, as many volunteers decided to vote on different face features for the same face record. A process was needed to normalize and decide which face feature to be taken as valuable data and to be ignored as not useful data. To resolve this problem, a threshold value for each category was defined. By definition, the face feature which was voted by majority (more than %50) is defined as voted, whereas other face features in the same feature category were ignored. As the formed face database involved many photos and caricatures, the voting process limited the volunteers to vote mostly from the beginning of database sorted by face id. As volunteers mostly seemed to leave at the half of the voting process, rest of the database never get voted. To resolve this problem, face ids were presented randomly to the volunteers to get every face id to be voted once at least. An example dataset, including 25 photos and 25 caricatures for same person, was formed to get a quick view of results by learning methods. This set was voted by volunteers and sum of votes for each face id in the example dataset was calculated to see if enough voting was done. Another request from volunteers was to reduce the time elapsed during voting process which usually took three to five minutes which made the voting process on the example dataset to last about 4 hours per volunteer. It was not an easy task to reduce the time elapsed for voting process. Another problem was seen when some voters gave empty answers to most of the categories presented in the poll, which significantly dropped the valuable data to be taken for analysis. Other than asking people to fill in all categories, empty votes were not taken into decision during data verification. Empty votes were cleaned up manually. Voting process was designed to collect data but not to output any message to volunteers, yet many volunteers asked for the results to be seen after voting, to see whether they had given the same vote with the others. It was not a goal for this thesis or project, yet it was asked to be made attractive to get more volunteers. Intended way was not to ask user to choose something taking other volunteers' perception as reference or to pay for each survey conducted. Some polling sites were researched to find a way to attract more volunteers. Payment was not an option as no funding was granted for voting process at the beginning. To attract more volunteers, a new application running on Android platforms like smart phones and tablets has been developed. As this research is not intended for a commercial product and many users of these platforms choose to download free applications, it is seen as an opportunity to develop poll application on these platforms to reach greater audience. To protect the privacy of the voters, no personal data was asked from volunteers for information security. Even their passwords were kept as hash in the database so that unintended users are kept away. A common protection tool, the captcha system was added when users were added. This also caused some problem as ITU servers were not accepting another verification tool from unknown sources to send information and a code was written to generate unique figure as captcha involving dots and lines to confuse webbots to access database. The website was designed independently from the size of the face database. When a new face feature or a new face feature category was to be added to the database, the website does not need to be changed accordingly to reflect changes. Poll table in face database recorded date of the vote and username of the voter which gave flexibility to choose which data to be collected for statistical methods. Select queries could be used to gather data for specific users or between selected time interval. Various concepts for statistical analysis and data mining methods are being developed by researchers nowadays. In future, when human perception of faces and mathematical algoritms for face recognition give similar results; this problem of face recognition is going to be solved with the concept studies like this thesis. The face database formed in this thesis could be referenced for future related studies and intended for academic use.
Açıklama
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2014
Thesis (M.Sc.) -- İstanbul Technical University, Instıtute of Science and Technology, 2014
Thesis (M.Sc.) -- İstanbul Technical University, Instıtute of Science and Technology, 2014
Anahtar kelimeler
Yüz,
Biyometrik Özellik,
Öznitelik,
Veritabanı,
Face,
Biometric Property,
Feature,
Database