Karşılaştırma Analizinde Eksik Verilerin Tahmini Ve Bir Uygulama

thumbnail.default.placeholder
Tarih
1997
Yazarlar
Akhisar, İlyas
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Institute of Science and Technology
Özet
Çok değişkenli veri analizinde, değişkenler arası ilişkilerin yapısı hak kında ön bilgilerin bulunmadığı hallerde bağımlılığın incelenmesinde çok sık kullanılan yöntemlerden biri olan Karşılaştırma Analizi tensörel bir yaklaşımla inşa edilerek, değişkenler ve bireyler kümesinin daha düşük boyutlu alt uzaylarda gösterilimi sağlanmıştır. Bu çalışmada, veri tablosunun tam olmadığı hallerde, mevcut verilerden hareket edilerek eksik verilerin tahmini için Karşılaştırma Analizi çerçevesinde iteratif bir yöntem önerilmiştir. Uygulama bölümünde, iki girişli veri tablosuna uygun olarak oluşturulan Türkiye'nin 1986-1995 yılları arasında 30 ülkeden 20 değişik ürünün ithalatı ile ilgili verilere Karşılaştırma Analizi uygulanmış ve elde edilen sonuçlar yorumlanmıştır. Son olarak 1995 yılı ithalat tablosundan rast- gele seçilen bazı hücreler boşaltılıp bu hücrelerin değerleri önerilen yön teme göre tahmin edilerek elde edilen değerlerin başlangıç değerlerle olan uygunluğu test edilmiştir.
The purpose of the data analysis methods is to represent the large sets. Even though the techniques used here are old fashioned but their improvements are quite new. These methods that are used to understand and interprete the rela tions between individuals and variables using some characteristics which are obtained from the data table, can be examined in two groups On the other hand, correspondence analysis techniques require com plete data, otherwise the interpretation of factors may become impossible. First of all, collected data which are applied to correspondence anal ysis should be known in some respects, i.e., data may contain specific errors in terms of homegenity and exhaustivity. In this study, the method of the correspondences analysis is inves tigated from the point of view of tensor algebra and the estimation of missing data are essentially examined by this method. First in 1930's H.Hotelling, C.Spearman and K.Pearson took into ac count this method and then it is improved in 1960 by J.P.Benzecri. Let I and J be two sets with finite number of elements. We consider the following Ku rectangular table Ku = {k(iJ) | iel, j e J} where k(i,j) > 0 then we obtain from Ku the frequency table Fjj- Fu-ifij | iel, jeJ] ; fij = ^ k = 5>(i, j) | iel, j e J} Here dividing every fij(i ? /) by the mass fj which is the sum of jth column, it yields -Fj - {// »//)". j // j...>// } This table can be considered as a mixed tensor of type (1,1). By assigning the mass fj to each fj, we obtain the set of points that we represent by N(J) N(J) = {(fİ, /i) I J e J} c Ri this is the set of variables. In the same manner the FJ table and the set of individuals N(I) can be obtained from Fu N(I) = {(/>, fi) I i ? 1} C Rj In the space Ri we define a metric by the quadratic form 6f m II _ = {m" | M'e/}, m" =f- The matrix M which represents the above quadratic form defines an isometry of R on R. If u° is the unit vector in the space Rj according to the metric m11 with matrix M, the projection of fj on the axis AuJ is represented by ip3a and is defined in tensorial form by İİ = M(u?, fj) = mn(u«(//) = VIa°fİ Here the symbol o is the transition defined between the measure fj and the function tp^ which is the orthogonal projection on the axes AuJ. At the same time this is the density function of u" according to //. The center of gravity gi of set of points is obtained as follows ix and it can be written as a tensor gi = fj ° fj- The quadratic inertia of the set of points about the subspace which is orthogonal to M~1(Sf) can be written as a tensor as follows on = (// o //) o fj Let Z be the matrix corresponding to F/, V be the matrix corre sponding to the tensor 07/ and Dp be diagonal weights matrix whose general element is djj = fj. It can be written as follows V = Z.Dp.Z' where V is a variance-covariance matrix. The aim of correspondences analysis is to search the nearest Hi sub- space to the set of N(J). This is {An? | a?[l,*]} The principal component of any point of the set which is on the a th axis İs found as M(fi) - L> then is obtained. Because of the various reasons, the data table can contain various number of missing data and then we must interprete the existing data statistically. However, to be able to interprete the factors require com plete data table. In this study we suggest an iterative method, using existing data to estimate missing cells (values) in the data matrix based on relations between k(i,j) and factors. k.k(hJ) i _ k{i).k(J) i + £A-1/ajp«(0.G.C;)' If we take \-lt2Fi{i).Gx{j), *£\Z1/2F«(i)-Ga(J), J2\-^2Fa(i).Ga(j),... a=l o=l instead of full summation, we obtain ls<,2n<i,3r(,.,,) 1 izijej} Approximations can be obtained at the end of all other steps are easily available in this way. XI In the last section, (30x200) data matrix is created in accordance with the last decade (1986-1995) imports of Turkey from 30 countries and different 20 products are investigated by the correspondence analysis statistically. These data matrices are considered as two-way tables and the corre spondence analysis method is applied to these tables. The results of analysis reflect the variaton and the characteristics of last decade imports of Turkey. We have chosen some of data table's cells by generating random num bers on the data table, completed the data table using the method that we have proposed and obtained values close to initial values.</i,3r
Açıklama
Tez (Doktora) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2010
Thesis (Ph.D.) -- İstanbul Technical University, Institute of Science and Technology, 2010
Anahtar kelimeler
karşılaştırmalı analiz, veri analizi, comparative analysis, data analysis
Alıntı