Hibrit Film Öneri Sistemi

Uluyağmur, Mahiye

Hibrit Film Öneri Sistemi

Dosyalar

704091021.pdf (1.22 MB)

Yazarlar

Uluyağmur, Mahiye

Yayınevi

Bilişim Enstitüsü
Institute of Informatics

Özet

The number and kind of available content and the number of users who can view them have increased tremendously in both movie/television and music domains. Therefore, recommendation systems that can accurately recommend to a certain user the set of products that he would most likely be interested and as fast as possible, have become important. While content based recommendation systems use features of products a user has viewed so far and they are domain dependent, domain independent collaborative filtering systems use only the ratings given to each product by a number of users. There are some shortcomings of both collaborative and content-based recommendation systems. Cold-start problem is one of the most important problem of the collaborative filtering systems. If a movie is not watched in the training set, this movie can not be recommended to any user. Content-based system can solve this problem. Moreover if a user is new in the system namely if s/he did not watch any movies, collaborative filtering system can not recommend any movies to this user either. In order to solve the new user problem user demographics can be used, however they tend to be not so reliable for many domains. In our system we first observe the watching behavior of a user for a number of movies and then do recommendations. Content-based recommendation systems rely on content features which need to be extracted. Rating matrices are generally sparse and high dimensional matrices, so it is costly to work with large matrices. In collaborative filtering system matrix factorization methods can generate low dimensional user and item factors to solve the sparsity problem. Content-based recommendation systems rely on content data gathered for a specific user and if too complex models are chosen they may suffer from overspecialization. Different hybrid recommendation systems that integrate content and collaborative recommendation systems have been proposed in the literature. In this thesis, content-based, collaborative and hybrid TV movie recommendation methods are proposed and evaluated. In the content-based recommendation system as the content for a movie, we use information such as movie actor, producer, genre and also words obtained from the movie summaries. In addition to these fields, computed (implicit) ratings which users give to the movies are used in the content-based system. Another recommendation method used in this work is the collaborative filtering method. Collaborative filtering method uses only users? ratings for movies. In this project, we also propose a hybrid movie recommendation system which uses a linear combination of recommendations proposed by the content-based and collaborative filtering methods. Recommendation systems need user ratings. However, for the TV recommendation problem, we do not have explicit ratings from the users. In this thesis, we used the implicit ratings of the movies, which are generated as the percentage of the movie watched by the user over all presentations of the movie. Therefore if a user watched a movie multiple times or different parts in different sessions, the implict rating reflects that. Another contribution of the thesis is the use of different performance evaluation criteria for TV movie recommendation. We evaluate performance of the movie recommendation system by using four evaluation measures. Two of them are the well known information retrieval performance measurements precision and recall. Precision is determined in our system as the number of movies watched by the user in top 10 recommendations divided by 10. High precision means system hits many correct movies in the top 10 recommendation. If a user has watched a lot of content, his/her precision is naturally high. Recall solves this problem since it divides the top 10 hits by the number of movies user u watched in the test set. In addition, two other performance evaluation measures are developed in this thesis: normalized precision and rating weighted normalized precision. Precision gets higher as the number of movies that a user watched in the test set increases and it also gets higher as the number of movies in the test set decreases. Normalized precision takes into account the number of the movies in the test set. Ratio of the number of movies watched by a user and the number of movies in the test set can be used as a normalization term for each user. Normalized precision is precision normalized by this ratio. This ratio is proportional to how much better a recommendation is compared to a uniform random recommendation system to a user who watches movies uniformly random. A recommendation system which recommends movies watched by the user with high ratings is more preferable to another system that recommends the same number of watched movies with low ratings. Rating weighted normalized precision (RWNP) performance measure takes into account the users ratings for the test movies. It is computed as the sum of the ratings of the watched movies in the top 10 recommendations and divided by the ratio of the number of movies that are watched by the user in the test set and the total number of the movies in the test set. The content-based recommendation system uses actor, genre, director and keyword features of movies watched by a user. In the feature extraction phase, first of all a movie-feature matrix which contains the features of all movies in the training set, is created. For a particular user, an existing feature in a watched movie is scaled by the implicit rating for that movie and the sum of the user?s weights for the movie?s features divided by the number of movies that the user watched in the training set gives the weight of a feature for that user. These features are reference features for the recommendation of the test set movies. If a feature weight for a user is greater than the other feature weights it means that the user gives more importance to this feature than the others. This feature weight computation is done separately for four different feature sets: actor, genre, director and keyword. In the test set when movie i will be recommended to the user u, firstly features of the movie i are extracted. Assume that actor feature set is chosen, which actor features movie i contains and whether user u watched such a movie which contains one of these actor features is investigated. If user u watched a movie which contains the actors of the movie i in the training set, then user u rating for movie i is determined by summation of the actor features weights of the user u. While generating ratings according to actor feature set, since usually movies have more than one actor, all available feature weights are summed. On the other hand, according to the director feature set generally there is one director for each movie, so user weights for director features are used directly as ratings for movies. It is observed that ratings generated using the director feature set are more successful than the others, while the genre feature set is also quite successful. Content-based recommendations for each feature set are also combined using three different strategies. Before combination, all generated ratings are normalized to 0-1 range using min-max normalization. In the first combination scheme, different feature sets? ratings are summed directly to generate a new rating for a user to a movie. The second combination scheme takes a weighted sum of the ratings for each feature set. The weight of a feature set is determined as the exponential of the negative mean absolute training set error between the actual ratings and the predicted ratings for that feature set. Weighted sum combination gives better results than sum. The third strategy aims to use the feature set which is likely to be the most successful for a particular user. The feature set with the minimum mean absolute training error for the user is chosen as the feature set to be used for test recommendations. In collaborative filtering, generally explicit feedback recommendation methods where users rank movies explicitly such as likes or dislikes or using scores, are used. However, in TV program recommendation problem, as in many other areas, it is difficult to request the explicit ratings from the user for the programs. Instead of ratings, there is information on how long the user watched an item. For such problems, instead of explicit recommendation methods, implicit methods should be used. In this work, we process the time durations for which users watch the programs to obtain implicit ratings and similar to prior work of others use these ratings for implicit recommendation. The user-movie matrix, which contains the users? ratings for movies can be used to assess similarities between users and movies and hence, for example, movies liked by users similar to the current user can be recommended. However, the user-movie matrix is a very sparse matrix and most user-user and item-item similarities may happen to be just zero. Matrix factorization techniques are used to represent each movie and user in a small number of reduced dimensions where user-item similarities are as close as possible to the ratings given in the training set. We first use the implicit computed ratings as if they are explicit ratings and use explicit matrix factorization methods. While learning the matrix factors, we introduce adaptive learning rate to speed up the learning and we also introduce user/item adaptive regularization. We also use implicit matrix factorization and compare it with the other recommendation methods. Since matrix factorization is a costly procedure which involves many parameters, we also used count based collaborative filtering to measure user-user similarities. In this method when movie i will be recommended to the user u, first the set of users who watched movie i in the training set is obtained. For each of these users, the count of movies liked by user u and that user is used as a similarity between the users. Count based collaborative filtering predicts ratings as the similarity weighted ratings of the users similar to user u for movie i.. In this thesis, we propose two hybrid movie recommendation systems by combining content-based and collaborative recommendation system ratings linearly. The first hybrid system HybridCommonMovie is obtained by combining content-based system and count based collaborative filtering system. The second one HybridMF is generated by combining content-based system and matrix factorization based collaborative filtering system. A weight parameter is used to adjust the contribution of the methods in the linear combination. Experiments were performed to assess the performance of the recommendation algorithms for thirteen months of data. Among all the methods experimented with, the best results are obtained with the HybridCommonMovie systems. For this recommendation system, averaged over all users, precision, recall, normalized precision and rating weighted normalized precision results are better than the other recommendation systems. HybridCommonMovie method also is the method which has the smallest number of parameters that need to be adjusted for different datasets, therefore is the preferred recommendation method for the TV recommendation dataset used in this thesis.

Açıklama

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Bilişim Enstitüsü, 2012
Thesis (M.Sc.) -- İstanbul Technical University, Institute of Informatics, 2012

Anahtar kelimeler

Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol, Öneri sistemleri, Computer Engineering and Computer Science and Control, Suggestion systems

URI

http://hdl.handle.net/11527/12211

Koleksiyonlar

Bilgisayar Bilimleri Lisansüstü Programı - Yüksek Lisans

Tam öğe sayfası