İş Öneri Sistemlerinde Adayların Ve İlanların Eşleştirilmesi İçin Sınıflandırma Tekniklerinin Uygulanması

Özcan, Gözde

İş Öneri Sistemlerinde Adayların Ve İlanların Eşleştirilmesi İçin Sınıflandırma Tekniklerinin Uygulanması

Tarih

2017-01-11

Yazarlar

Özcan, Gözde

Yayınevi

Fen Bilimleri Enstitüsü
Institute of Science and Technology

Özet

Son yıllarda İnternet üzerinde kullanıcı sayısındaki artış ve buna bağlı bilgi miktarının artmasından dolayı uygun iş/aday bulma oldukça zor ve zaman alıcı bir süreç haline gelmiştir. Bu büyük veri arasından istenen iş ve adayın kolaylıkla bulunabilmesi için iş öneri sistemleri geliştirilmiştir. İş öneri sistemleri günümüzde e-ticarette oldukça kullanılan sistemlerdir. Çoğu şirket tarafından İnternet üzerindeki işe alım platformları, işe alım sürelerini ve reklam maliyetlerini azaltmasına yardımcı olmasından dolayı birincil işe alım kanalları olarak kullanılmaktadır. İş aramada en yaygın yaklaşım bir kullanıcının aradığı kriterleri iş arama sitelerine girerek araması ve sistemin kriterler ile eşleşen ilanları/adayları kullanıcıya döndürmesi şeklindedir. Dönen sonuçlara göre kullanıcı kendi kriterlerine göre ilanları/adayları değerlendirir. Eşleşen ilan/aday sayısının fazla olması kullanıcıların aradıkları en uygun ilanı/adayı bulmalarını zorlaştırmakta ve zaman kaybına neden olmaktadır. İş öneri sistemleri kullanıcıların bu otomatik olmayan aramalarını ortadan kaldırarak en uygun ilanları/adayları kullanıcılara önermeyi amaçlarlar. İş öneri sistemleri adaylara tercihlerine uygun olan ve başvurabileceği işlerin listesini ya da iş verenlere gereksinimlerine uygun adayların listesini döndürecek şekilde tasarlanmışlardır. Tipik olarak iş öneri sistemleri adaylara ve ilanlara bir öneri listesi sunmaktadır. Bu sistemler uygun ilanı/adayı bulmaya çalışırken aynı zamanda da kullanıcı profili ile ilgisi olmayan ilanları/adayları kullanıcılara önermemelidir. Bu nedenle de yüksek kalitedeki iş öneri sistemleri kullanıcıların özelliklerine göre uygun öneriler sunan yaklaşımlar kullanmaktadır. İş önerileri için kullanılan farklı bir çok strateji bulunmaktadır. Bu tez çalışmasında kullanıcıların benzerliklerini, etkileşim özelliklerini ve tercih bilgilerini birlikte kullanan bir iş öneri sistemi amaçlanmıştır. Önerilen sistemde öncelikle hedef adaya benzer olan adaylar ve bu benzer adayların başvurduğu ilanlar bulunur. Daha sonra ilanların hedef adaya dönecekleri cevaplar ilanların tercih bilgileri kullanılarak elde edilir. Son olarak da karşılıklı olarak hedef adayı tercih eden ilanlar adaya sıralı olarak önerilir. İlanların hedef adaya dönüş değerlerinin tahmininde değişik sınıflandırma yöntemleri kullanılarak önerilen sistemin sınıflandırma yöntemleri için başarıları ölçülmüştür. İlanların önerilen listedeki sıralarını belirlemek için ise adayların tercih bilgilerine dayanan yeni bir yöntem kullanılmaktadır. Geliştirilen modelin performansı bir iş arama sitesinden alınan veri kümesi kullanılarak değerlendirilmiştir. Önerilen modelde kullanılan veri kümesi 10 ilan için yapılan 7455 başvurudan oluşmakta ve ilanların başvurulara verdikleri cevaplar bulunmamaktadır. Bu nedenle önerilen model için öncelikle ilanların dönüş değerlerinin belirlenmesi kısmına odaklanılmıştır. Bu kısımda dönüş değerlerinin elde edilmesi için iki yöntem kullanılmıştır. İlk yöntem ilanların adayların özgeçmişleri üzerindeki hareket bilgilerini kullanmaktadır. Bu hareket bilgileri kullanılarak hangi başvurulara olumlu hangi başvurulara olumsuz dönüş yapılmış olunabileceği belirlenmiş ve toplamda 748 başvuru için olumlu dönüş elde edilmiştir. İlanların dönüş değerlerinin belirlenmesinde kullanılan ikinci yöntem ise Trust Rank(TR) yöntemidir. TR yöntemi web ortamında istenmeyen kullanıcı(spam) belirlenmesinde kullanılan bir yöntemdir. Bu yöntemin kullanılmasının nedeni istenmeyen kullanıcı belirlenmesi ile kullanılan veri kümesindeki ilanların başvurulara dönüş değerlerini belirleme probleminin arasındaki benzerliktir. TR yönteminde kullanıcılar arasındaki ortak kaynak/etiket benzerlikleri kullanılarak aynı kaynağa aynı etiketi veren yani aynı davranışı gösteren kullanıcılar benzer olmalıdır yaklaşımı kullanılmaktadır. Bu yaklaşım ile istenmeyen kullanıcı olup olmadığı bilinen az sayıda sayfa TR ile kullanılarak bilinmeyen sayfaların etiket değerleri belirlenir. Bu yaklaşımın bu çalışmada kullanılan veri kümesi için de uygun olabileceği düşünülmüş ve ilanların olumlu dönüş yapacakları adayların da kendi aralarında benzer olacakları varsayılmıştır. Bu varsayım ile elle etiketlenen az sayıdaki başvurular ile TR yöntemi kullanılarak ilanların dönüş değerleri olumlu veya olumsuz olarak belirlenmiş ve 1967 başvuru için olumlu dönüş değeri elde edilmiştir. İlanların dönüş değerlerinin belirlenmesi için kullanılan bu iki yöntem ile elde edilen sonuçlara önerilen yöntemin adımları ayrı ayrı uygulanarak sonuçta elde edilen değerler karşılaştırılmıştır. Önerilen yöntem temel olarak 4 kısımdan oluşmaktadır. Birinci kısım adaya önerilecek ilan listesinin oluşturulmasıdır. Sisteme yeni giriş yapan bir adaya öneride bulunmak için öncelikle bu adaya benzer olan adaylar bulunur. Daha sonra benzer adayların daha önceden başvurdukları ilanlar bulunarak listelenir. İkinci kısım ilanların dönüş değerlerinin bulunmasıdır. İlanların dönüş değerlerinin belirlenmesi kısmından elde edilen dönüş değerleri her ilan için sınıflandırma yöntemleri ile eğitilerek benzer adayların başvurduğu ilanların hedef adaya nasıl bir cevap dönebilecekleri hesaplanır. Böylelikle benzer adayların başvurdukları her ilan için hedef adayı tercih edip etmeyecekleri bilgisi elde edilmiş olur. Üçüncü kısım ilanların önem değerlerinin bulunmasıdır. Sistemde adayların daha önceden yaptıkları başvurular kullanılarak her aday için bir güven değeri hesaplanır. Güven değeri, bir adayın başvuru yaptığı ilanların benzerliklerinden elde edilen değerdir. Eğer adayın başvurduğu ilanlar arasındaki benzerlik yüksek ise adayın güven değeri yüksek, düşük ise adayın güven değeri düşüktür. Her aday güven değerini başvurduğu ilanlara verir. İlanların önem değerleri de kendisine başvuran adayların güven değerlerinin toplamından elde edilir ve bu önem değerleri ilanların skor değerleri olarak kabul edilir. Böylelikle her ilan için bir skor değeri elde edilmiş olur. Dördüncü ve son kısım ise sıralı ilan öneri listesi oluşturulmasıdır. Adaya önerilecek her ilan için ilanın adaya yapacağı dönüş değeri ile skor değerleri çarpılarak elde edilen değerlere göre ilanlar sıralı olarak hedef adaya önerilir. İlanların başvurulara dönüş değerlerinin belirlenmesi için kullanılan iki yöntem ile elde edilen sonuçlara önerilen yöntemin dört adımı ayrı ayrı uygulanarak iki çalışma oluşturulmuştur. Bu iki çalışma sonucunda elde edilen değerler MAP ve TNR ölçümleri kullanılarak değerlendirilmiştir. Sonuçlara bakıldığında; TR kullanılarak elde edilen dönüş değerleri için yapılan deneylerin kurallar kullanılarak elde edilen sonuçlara göre daha yüksek doğruluk verdiği görülmüştür.
Because of the increase in the number of internet users and the increase in the amount of knowledge correspondingly in the recent years, being able to find an appropriate candidate\job has become a real tough process. Job proposal systems have been developed with an attempt to find the desired jobs and candidates among this enormous data with ease. Job proposal systems are frequently used within the field of e-trade today. The online recruitment platforms are being used as primary hiring channels by a great number of companies with the aim of decreasing the employment periods and advertisement expenses. The most common technique in job search is that a user should input his or her criteria in the job search websites and then the system should present the user the job advertisements\candidates that match with the criteria of the system. The user evaluates the advertisements\candidates according to his\her own criteria in line with the results. The high number of matching advertisements\candidates makes it difficult for users to find the most appropriate job\candidate and causes a loss of time. Job proposal systems aim to eliminate this manual searches of the users and present the optimum adverts\candidates to the users. Job proposal systems have been designed in a way to present the candidates with the list of the optimal jobs that they can apply to or the employers with the candidates that are best fit for their requirements. Typically, job proposal systems offer the candidates and adverts a list of proposals. While these systems try to find the appropriate adverts\candidates, they should also not present the adverts\candidates that are not related with the user profile to the users. For this reason, high-quality job proposal systems make use of approaches that are suitable with the characteristics of the users. There is a wide range of strategies that are used for job proposals. This study aims to discuss a job proposal system that handletheusers’ profiles, interaction features and preferences all together. The proposed system primarily includes the candidates that are similar to the target candidate and the job adverts applied by similar candidates. Then the answers that the advertisements will return to the target candidate are obtained by using the preference information of the advertisements. And finally, the advertisements that prefer mutually the target candidates are suggested to the candidate in order. The success rates of the system for classification methods that is proposed using different classification methods have been measured to estimate the return values of the advertisements to the target candidate. A new method based on the preference information of the candidates is used to determine the order of the advertisements in the proposed list. The proposed system has been evaluated by using a data set received from job search sites. The data set used in the proposed model consists of 7455 applications made for 10 advertisements and there also exist the answers given by the advertisements to the applications. For this reason, the focus is centered primarily on the determination of the return values of the advertisements. For this reason two methods have been used to obtain the return values. The first method uses the activity information of the advertisements on the background of the candidates. By using this activity information, the rules governing which applications returned with positive which ones returned with negative responses were determined. Applying these rules to the data set resulted in a positive return for a total of 748 applications. The second method that was used to determine the return values of advertisements is the Trust Rank (TR) method. The TR method is a method used to detect spam users on the web. The reason why this method is used is the similarity between the problem of identifying spam users and the problem of determining the return values of advertisements in the data set used. The TR uses the approach which states that by using the source\tag similarities, the users giving the same source the same tag namely exhibiting the same behavior should be similar. With this approach, a small number of pages known to be spam users are used with TR to determine the tag values of pages that are not known to be spam users. This approach was also thought to be appropriate for the data set used, and it is assumed that the candidates for which the advertisements would return positively will be similar among themselves. With this assumption, the return values (positive / negative) of the advertisements were determined using the TR method with a small number of applicants manually tagged and a positive return values were obtained for 1967 applications. The steps of the method proposed for the results obtained by these two methods used for the determination of return values of advertisements are applied separately and the obtained values are compared. The proposed method consists of 4 parts. The first part is creating the announcement list to be proposed to the candidate. In order to make a recommendation for a candidate who is entering the system just recently, firstly the candidates similar to this candidate in question are found. Subsequently, the advertisements applied by the previous similar candidates are found and listed. The second part is the determination of the return values of the advertisements. The return values obtained from the section of the determination of the return values of the advertisements are trained by the classification methods for each advertisement, and what kind of a response the advertisements applied by similar candidates will return to the target candidate is calculated. In this way, the information whether they will be preferring the target candidate for each advertisementis obtained. The third part is determining the importance values of the advertisements. A confidence value is calculated for each candidate by using the applications in the system that were previously carried out by the candidates. Trust value is the value determined from the similarities of the advertisements made by a candidate. If the similarity among the advertisements the candidate applied to is very high, the confidence value of the candidate is high, if it is low then the confidence value of the candidate is low. Each candidate provides his/her confidence value to the advertisement they apply to. The importance value of the advertisement is determine through the sum of the candidates who are applying to it and these importance values are accepted as the score value of the advertisement. Thus, a score value for each advertisement is obtained. The fourth and the last part is the formation of a consecutive advertisement proposal list. For each advertisement to be proposed to the candidate, the return value of the advertisement for the candidate and score values are multiplied. Then the advertisements are proposed to the target candidate in sequences in line with the obtained values. Two steps of the method that was used for the determination of return values of advertisements on applications and the steps of the method proposed for the results obtained by means of two methods are separately applied and two studies were formed. The results obtained as a result of these two studies were evaluated using the MAP and TNR measurements. When looking at the results; it was seen that the experiments conducted on return values obtained using TR show higher accuracy than the results obtained using rules.

Açıklama

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2017
Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2017

Anahtar kelimeler

Karşılıklı İş Önerisi, Sınıflandırma Yöntemleri, Soğuk Başlangıç Problemi, Trust Rank, Job Reciprocal Recommender, Classification Methods, Cold Start, Trust Rank