DCT resim kodlama uygulamaları

thumbnail.default.alt
Tarih
1996
Yazarlar
Dipçin, Vadi
Süreli Yayın başlığı
Süreli Yayın ISSN
Cilt Başlığı
Yayınevi
Fen Bilimleri Enstitüsü
Özet
Günümüzde, günlük yaşamı etkileyen en önemli gelişmelerin başında sayısal teknoloji gelmektedir. Bu teknolojinin sağladığı en önemli olanaklardan biri de sayısal televizyondur. Çok yoğun çalışmaların yürütüldüğü bu alanda en önemli sorunlardan birisi hiç şüphesiz sayısal resim bilgisinin büyüklüğüdür. Gerek depolanması, gerekse işlenmesi ve yayınlanması aşamalarında sistem performansları ve fiyatları açısından, sayısal resim bilgisinin sıkıştırılması bir gerekliliktir. Bunun sağlanması amacıyla yapılan çalışmalar yaygın olarak kullanılan iki standardın ortaya çıkmasına yol açmıştır. Hareketsiz resimler İçin JPEG, hareketli görüntüler için ise MPEG standarttan ticari uygulamalarda yer almaktadır. Bununla birlikte daha iyi kodlama yöntemleri üzerindeki çalışmalar da devam etmektedir. Bu tezde hareketsiz resimler için alternatif kodlama yöntemleri üzerinde durulmuştur. Bütün yöntemler JPEG ve MPEG'in kullandığı 8*8 Ayrık Kosinüs Dönüşümü (Discrete Cosine Transform: DCT) üzerinde kurulmuştur. Başlangıç bölümünde resim kodlaması ile ilgili kavramlar ve yöntemler genel olarak tanıtılmıştır. Üçüncü bölümde resim bilgisine kaskad DCT uygulanmasına dayanan bir yöntem incelenmiştir. Dördüncü bölümde çeşitli yöntemlere dayanan vektör kuantalama uygulamalarına yer verilmiştir. Beşinci bölümde, resmin, dönüşüm uygulanmadan önce küçük benzer parçalara ayrılması yoluyla kodlanması gerçekleştirilmiştir. Altıncı bölümde verilen uygulamada JPEG standannda kullanılan lineer skalar kuantalama yerine düzgün olmayan skalar kuantalama ve uygun bir akış uzunluktu kodlama kullanılmıştır. ilk üç yöntemin istenen performansı verememesiyle birlikte, düzgün olmayan skalar kuantalama uygulaması kayda değer bir verimlilik göstermiştir ve JPEG standardına göre daha avantajlı bir kodlama yöntemi elde edilmiştir. Bu kodlama yöntemi kolaylıkla JPEG ve MPEG'e uyarlanarak bu standartların daha verimli bir şekilde çalışmalarına imkan verebilir.
The very rapid development in digital technology in last the 30 years brought many new aspects to our life. Digital systems got cheaper and more sophisticated. This made many things possible which were thought to be just imagination. One step of this development is Digital TV. Such systems will provide higher quality for pictures using narrow band widths, more compatibility etc. for broadcasting. However, there is an important problem about the size of digital images. A picture of 720*576 luminance resolution and 360*576 chrominance resolution (for each chrominance component) is about 830kB (COR 601, 4:2:2 standard). A motion picture using such frames will require about lOGB/min. Not only the transmission, but also the storage of such a big amount information is not practical and very expensive. This yields the need of compression of digital picture information. A very well known standard for still images is JPEG (Joint Photographic Experts Group). The world wide used standard for motion pictures is MPEG (Motion Pictures Experts Group). MPEG-2 is now being used for digital broadcasting of analog TV signals suchasPALorNTSC. This thesis proposes a new coding system for still images. All coding systems obtained and tested, depend on the 8*8 DCT (Discrete Cosine Transform) which is also used by both JPEG and MPEG. The definition of N*N forward and inverse DCT is, ^..^./rr^....^__r (2x+l)«* -,___r (2y+l)vr x=0y=C «vO-gCMCM E £«*,)«[ *T- M ^^ ] (.) ^. 2 *r*A-,. _, x., N r (2x +l)«ir -, r (2y +l)vir -, ZnfiTjf Ç^P(«)C(v)/(ır,v)ooB[ 2NJ ]cos[ *^ ] (2) 1 where C(u),C(v)=-t= for u,v=0 and C(u),C(v)=l for u,v *0 In image coding systems, usually a quantization is applied after the transform. There are two main kinds of quantization: 1- Scalar Quantization. 2- Vector Quantization. When applying scalar quantization, each transform coefficient is processed separately. They are either divided by certain divisors, or they are set to some certain quantized values. If division is used, the quotients are coded and stored. During the decoding process, the quotients are multiplied with the divisors used for division and the transform coefficients are obtained. The critical point is that the quotients are rounded to integers before coding. Therefore, the transform coefficients obtained after the multiplication are not equal to the original values. Thus, the picture obtained using the coded data is not exactly the same as the original one and an error occurs. This kind of quantization is Linear Scalar Quantization. The second method of scalar quantization is the Nonuniform Scalar Quantization. In this method, certain quantization values are determined according to the probable value set of the coefficients. For all coefficients, the quantization values are scanned and the closest value is chosen. Statistically, most of the coefficients have small values. Therefore, small values are quantized more precisely. For each quantization value, the index number is coded and stored. Lloyd-Max iteration can be used to optimize the codebooks. In vector quantization, coefficients are coded in groups. The grouping may change according to the application. The size of vectors and the codebook determine the quality of the coded picture. LBG algorithm (Linde-Buzo-Gray) can be used to optimize the codebooks. For each vector, the whole codebook is scanned and the closest code is chosen. MSE (Mean Squared Error) measure is used for comparison. All these quantization methods are lossy and cause error in the pictures. The amount of the error is important. It may cause important artifacts on the picture. To avoid this, the human visual system is to be studied. The human eye can not see all the high frequency patterns. Therefore, quantization systems are mostly designed so that they will remove the high frequency information and keep the relatively lower frequency information. Scalar or vector, all quantizers must be determined considering the visual effects on the coded pictures. After quantization, the last step is the Lossless Coding. This includes methods like Huffman Coding or Run Length Coding (RLE). Huffman coding reduces the average codeword length in a message sequence to the value of the entropy. Entropy is the amount of average information in the message sequence. RLE is a different coding system. If many coefficients following each other are equal, then they are counted and the count-value is stored with the repeated value. This may be very efficient if many coefficients following each other are equal. VI In the thesis, four methods are studied. The first one is the application of Cascade DCT. 8*8 DCT is applied to 256*256 pixelsize 8 bit/pixel (256 gray level) pictures. The DCT coefficients which have values between -1024 and 1023, are then scaled to [0,255]. This is done by adding 1024 to the coefficients and dividing by 8. Then all the DCT coefficients of different 8*8 blocks with the same index are to be grouped so that we obtain 32*32 size blocks (There are 1024 each of 64 different frequency components because picture size is 256*256.). The result of quantization is that many coefficients go to zero after the process. In order to realize this, some selected frequency components may be set to zero. A zero setting process can be defined (zonal coding). The "level" of the process is defined as the number of diagonal lines set to zero from bottom right to top left, beginning from the right bottom corner of the 8*8 matrix. The example is given in Figure 1. (a) Figure 1: Zero Setting Process, (a) l.st level, (b) 3.rd level. After applying the 8.th level zero setting process, the grouping is done and the second DCT is implemented. The main objective of this method is to represent the image data with a set of numbers which gets smaller after each DCT step. Therefore another zero setting process is applied after the second DCT. The second zero setting is done at level 4 and level 8. Then the picture is reconstructed from this data. vu The effect of this coding on the picture is interesting. Patterns on the pictures (like the nose holes of Tiffany) are copied onto adjacent blocks. The distance between repetitions is exactly 8 pixels. The reason of this effect is the zero setting process after the second DCT. Zero setting process is a Low-Pass process which makes adjacent values closer or equal. So, the coefficients of the first DCT are made closer or equal because of the zero setting process applied after the second DCT. Therefore, frequency components of neighbor blocks become similar and the inverse DCT yields similar patterns. This means that the information obtained after the second DCT must be kept at a better resolution than tha first DCT hence this tecnique with Cascade DCT does not bring any advantage for coding systems. The second studied method is Vector Quantization of DCT coefficients. This is done in many ways, using 2*2, 4*1, 8*1 vectors and with one or two codebooks. For preliminary study, only the DCT coefficients (0,1) (first order vertical frequency component) and (1,0) (first order horizontal frequency component) are quantized. It is confirmed that bigger vector sizes ( with equal codebook size) give worse results compared to smaller vector sizes. In the application of quantization with two codebooks, the first codebook is used to quantize the coefficients and the second one is used to quantize the error vectors. This improves the picture quality compared to the application with only one codebook. On the other hand, no generally usable single codebook is obtained. All the quantization processes cause blocking effects on the pictures. The third method divides pictures into smaller and similar parts in the spatial domain. The division ratio of parts may change. For 256*256 pixel picture sizes, 64 is used in order to obtain a good compression quotient. The process is as follows. The pixels of the left top 8*8 block are selected as the left top pixels of 64 small pictures. Pixels of the right neighbor of 8*8 block are selected as the right neighbor pixels of the first ones etc. So, 64 similar but different, offset decimated small copies of the full picture are obtained. This is done in order to obtain higher similarities. The DCT coefficients of similar pictures should be similar and this may result in good compression quotients. However, the similarities are not as high as expected. Since there are 64 similar small pictures, 64 DCT coefficients are relevant and their values are expected to be close. The distribution of their value is about ±100 around the mean and this not enough. Because the similarity is not strong enough, the coefficients are requantized to [0,255] range like in the case of Cascade DCT application so that the distribution becomes more convenient. Then the difference values of the 64 regrouped relevant coefficients from their means are calculated. The differences are Vector Quantized (4*4), and the vector indexes and the mean are stored. The picture reconstructed from the coded data has strong noise. Because the picture is divided and neighbor pixels are transformed and quantized separately, there are no blocking effects. The spatial error is like white noise. There are also some contour effects. Due to coding noise and counter effects, this method is not recommended for further use. vui The last application is the Nonuniform Scalar Quantization (NSQ). This is implemented in such a way that JPEG's coding system is improved. The method first implements 8*8 DCT like JPEG and then reorders the coefficients of each 8*8 block as in JPEG. This order is called the zig zag scan order. At this point, JPEG uses linear scalar quantization then Run Length Coding (RLE) and finally Huffman Coding. The method in this thesis uses NSQ, Run Length Coding and Huffman Table. In JPEG's method, RLE uses some additional bits which show the exact value of coefficients. The number of zeros following each other is determined and the first nonzero coefficient is coded with this number. There are only 4 bits to store the information in the coefficient's value and this is not enough because the quantized values may change between -1024 and 1023. So, the 4 bits are used to store group numbers which show intervals. Additional bits are required in order to represent the exact value of the quantized coefficients in the particular interval. On the other hand, in NSQ, only index numbers are stored. The maximum codebook size needed to obtain a picture quality as in JPEG's method quality level 50 (JJG implementation), is 25 (the code book size is less than 16 for 90% of frequency components!). Thus, maximum 5 (mostly 4 or less) bits are needed to express the exact value of coefficients. The RLE method constructed for NSQ can supply these 5 bits and, there is no need for the additional bits which can't be compressed using Huffman Code. An additional advantage is that the entropy of the coefficients are smaller compared to linear scalar quantizers because the source alphabet will contain less messages. The obtained pictures are compared with JPEG pictures, at the quality level around 50. It's very clear that NSQ can compress pictures about 5% better for the same quality when the optimized Huffman Table is used by JPEG. Examples of results are given in Table-1 and Table 2. Using NSQ, JPEG's standard can be made more efficient and faster because NSQ doesn't need multiplications and divisions. Only subtraction is needed in order to compare the coefficients with the quantization values in the codebook because the closest quantized value is searched for each coefficient. Even the subtraction is not used during the decoding process. On the other hand, NSQ yields certainly more efficient compression than JPEG's linear scalar quantization at the same picture quality. These two advantages are the main reason for further study of the new coding system. JPEG standard can be modified so that it uses NSQ. IX Table- 1: Comparison of NSQ with regular JPEG. All the JPEG pictures are compressed at quality level SO. The obtained file size is given in the 7.th column. If a custom optimized Huffman table is constructed which requires an additional pass, than the file sizes given in the 8th column are obtained. Table-2: Compression of NSQ with regular JPEG given in percent of compression improvement gain for NSQ. The only disadvantage of NSQ is that it's not as flexible as linear quantization in the sense of choosing different quality levels. In order to code pictures at different quality levels, NSQ codebooks must be changed and the RLE method must be modified to accommodate codebooks of different sizes. There are two possibilities. The first one is to determine a constant quantizer whose quality level and compression quotient are acceptable. This is useful, especially in cases where different quality levels are not required (may be useful for MPEG). The second possibility is to construct different quantizers to provide different quality levels. In this case, RLE implementation details must be determined for each of them, separately as well. In both cases, NSQ will have its advantage against linear scalar quantization providing better compression.
Açıklama
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1996
Anahtar kelimeler
Görüntü kodlama, Image coding
Alıntı