##
Entropi kodlama ile EKG veri sıkıştırması

Entropi kodlama ile EKG veri sıkıştırması

##### Dosyalar

##### Tarih

1995

##### Yazarlar

Aktener, Abdurrahman

##### Süreli Yayın başlığı

##### Süreli Yayın ISSN

##### Cilt Başlığı

##### Yayınevi

Fen Bilimleri Enstitüsü

Institute of Science and Technology

Institute of Science and Technology

##### Özet

Veri sıkıştırma, hem veri saklama, hem de iletişim alanında büyük kolaylıklar sağlamaktadır. Bu yolla, aynı ortamda daha fazla bilgi saklanabilmekte veya aynı bant genişliğinde daha fazla bilgi daha kısa sûrede gönderilebilmektedir. Elektrokardiyogram (EKG) kalpdeki elektriksel aktiviteyi temsil eden grafiksel bir gösterimdir. Bu çalışmada, BKG verisi için uygulanan sıkıştırma teknikleri tanımlanmış ve EKG verisini sıkıştırmak için Huffman entropi kodlama metodu uygulanmıştır. Huffman kodlamasında kodkelimelerin kaynak mesajlara eşlenmesi, kaynak mesajların verideki olasılıklarına göre yapılır. Veride sık geçen mesajlar ( yüksek olasılıklı mesajlar) kısa kodkelimelere, nadir geçenler (düşük olasılıklı) uzun kodkelimelere atanır. Bu olasılıklar, sıkıştırma başlamadan önece veri bir kez okunduktan sonra belirlenir. Dinamik yöntemlerde ise, mesaj kümesinden kodkelimesine olan eşleme zamanla değişmektedir, örneğin dinamik Huffman kodlamasmda, kodkelimelerin mesajlara eşlenmesi herhangi bir zamandaki olasılık değerlerine göre değişir. Sıkıştırma oranım arttırmak amacıyla verideki sıfırların fazlalığı göz önünde tutularak verinin işlenmesinde tekrar uzunluğu kodlama yöntemi de uygulanmıştır. Bu yöntemde, sıkıştırma anında arka arkaya gelen semboller, sadece sembol ve tekrar sayısı ile değiştirilir. Genişletirken ise tersi işlem olan sayı kadar sembol eklenmektedir. Bu çalışmada kullanılan sıkıştırma algoritmasında ilk olarak örneklerin ayrık kosinüs dönüşümü katsayıları hesaplanmakta ve katsayılara normalizasyon ve eşik değeri işlemleri uygulanmaktadır. Kodlama algoritmasında ise Huffman ve tekrar uzunluğu kodlamaları kullanılmıştır. Tezin içeriğinde ilk bölümde elektrokardiogram verisi için daha önce tanımlanmış sıkıştırma teknikleri sunuldu, İkinci bölümde, üzerinde çalışılan veri olan elektrokardiyogram verisinin analizi amacıyla temel olarak kalbin elektrofîzyolojisi anlatıldı. Tezin amacı veri sıkıştırma olduğundan üçüncü ve dördüncü bölümlerde veri sıkıştırmada genel kavramlar ve veri sıkıştırma yöntemleri anlatıldı. Son olarak beşinci bölümde, kullanılan algoritma ve sonuçlan verilmiştir.

In this study, a variety of data compression techniques for ECG signals are presented. The study especially concentrates on Huffman DCT -based entopy coding algorithm. The electrocardiogram (ECG) is a graphic representation of the electrical activity of the heart ECG is recorded from the body surface and it assists an experienced interpreter in giving a prognosis. Compression of digital ECG signals has been a research due to the developments in the techniques of digital processing of analog signals. By the usage of digital signal processing to interpret, store or transmit ECG signals, a need to have memory chips with large capacities has arisen This makes the compression of digital ECG signals a necessity. The discrete cosine transform (DCT) was first applied to image compression in Ahmed, Natarajan, and Rao's pioneering work, in which they showed mat this particular transform was very close to the KLH ( Karhulenen-Loeve-Hotelling) transform, a transform mat produces uncorrelated coefficients. Decorrelation of the coefficients is very important for compression, because each coefficient can men be treated independently without loss of compression efficiency. The DCT is an orthogonal transform having some of the features of a transformation to the frequency domain It is equivalent to a discrete Fouirer transform (OFT) of a symmetricized extension of the input set of samples and fast algorithms are available for efficiently computing the DCT. The DCT has an advantage over the DFT in that it is a purely real transform if the input vector is real The one dimensional DCT is defined as; Cx(0) = ^Z*(») Cx(k) = jf^m^2"^ l)k te=U, >N-1 Where C*(k) is the k* DCT coefficient VUl Data compression is the process of transforming a series of bytes into a new string mat contains the same information but whose length is as small as possible. Data compression has important applications in the areas of information storage and data communication. Most computer applications nedd a large amount of data storage. Also, computer communication networks transfer large volumes of data over commumcation lines. Compressing data to be stored or transfered reduces the cost of storage and transmissioa When files are stored in compressed form, the capacity of storage medium is approximately doubled. Smilarry, when the amount of data to be transmitted is reduced, the capacity of commumcation channel is increased. Thus, data compression provides an echonomical and indispensible option to store and transmit the informatioa Data compression may be achieved in either software or hardware. When it is hardware, a compressor a hardware device that reads a series of of bytes of data and converts it into a shorter string; and decompressor is a device that converts the compressed data back to its original form. When data compression is done in software, a compression algorithm does the former, and an expansion or decompression algontm does the latter. The logic that enables data compression is that the same information retained in the file after some bytes are replaced with shorter representations. In other words, data compression is achieved by removing the redundancy in the file. There are several types of redundancy found in the files. These categories are not independent, but overlap to some extend. The major redundancy types are data distribution, data repetition, high usage patterns and positional redundancy. In typical files, some bytes occur more frequently than others. Moreover, some bytes may never be used. Thus, a few bits on the average might be saved in the representation of bytes, using less number of bits for most frequently used bytes and more number of bits for least frequently used bytes. In some files, long repetitions of a single value occurs. These files can be encoded more compactly than by just repeating, the value symbol. Examples, strings of zeros. In general, data compression techniques can be divided into two categories, the nonreversible and reversible. In nonreversible techniques, the size of the physical representation of data is reduced while a subset of the information is preserved. For example, in some data sets 'leading zeros" may be considered as irrevelant informatioa Note that these techniques are, by definition, on the semantics of the data IX In reversible techniques, all of the information is considered to be relevant, and compression/decompression process recovers Ike original data The reversible procedures can further divided into two groups, semantic independent and semantic dependent techniques. Semantic dependent techniques can be used on any data with varying degrees of effectiveness. They do not use any information regarding the information content of the data. On the other hand, semantic dependent techniques depend on the context and semantics of the data to provide for redundancy reduction A code is a mapping of source messages into codewords. Codes can be categorized as block-block, block-variable, variable-block or variable block, where block block indicates that the source messages and codeword are of fixed length and variable-variable codes map variable-length source messages into variable-length codewords. Data compression schemes are also classified as either static or dynamic. A static method is one which the mapping from the set of messages to be set of codewords is fixed before transmission begins, so mat a given message is represented by the same codeword every time it appears in the message ensemble. In these methods, the mapping is based on the probabilities with which the source messages appear in the source ensemble. Messages that appear frequently are represented by short codewords; messages with smaller probabilities map to longer codewords. These probabilities are determined before transmission begins. A code is dynamic if the maping from the set of messages to the set of codewords changes over time. These methods involve computing an approximation to the probabilities of occurrence, as the ensemble is being transmitted. The assignment of codewords to messages is based on the values of the relativi frequencies of occurrence at each point in time. A message may be represented by a short codeword early in transmission because it occurs frequently at the begining of the ensemble, even though its probability of occurence over the total ensemble is low. Later, when the more probable messages begin to occur with higher frequency, the short codeword will be mapped to one of the higher probability messages, and it will be mapped to a longer codeword. Dynamic codes are also referred to in the literature as adaptive, in mat they adapt to changes in ensemble characteristics over time. All of the adaptive methods are one-pass methods; only one scan of the ensemble is required. Static methods require two passes; one pass to compute probabilities and determine the mapping, and a second pass for transmission Thus, as long as the encoding and decoding times of an X adaptivehod are not substantially greater Ihan those of a static method, the fact that an initial scan is not needed implies a speed improvement in the adaptive case, hi addition, the mapping determined in the first pass of a static coding method must be transmitted along with compressed ensemble by the encoder to the decoder. There are two dimensions along which the data compression schemes may be measured: algorithm complexity and amount of compression When data compression is used in a data transmission application, the goal is speed. Speed of transmission depends on me number of bits sent, the time required for the encoder to generate the coded message, and the time required for the decoder to recover the original ensemble, hi a data storage application, altough the degree of compression is the primary concern, it is nonetheless necessary that the algorithm be efficient in for the schemes to be practical. For a static scheme, mere are three algoritms to analyse: the map construction algorithm, the encoding algorithm, and the decoding algorithm. For a dynamic method, there are just two algorithms: the encoding algorithm and the decoding algorithm Several common measures of compression have been suggested: average message length and compression ratio. Average message length is clearly the average length of a coded message. Compression ratio is defined in several ways. The ratio of the length of the coded message to the length of the original ensemble gives the most common meaning. Semantic dependent data compression techniques are designed to respond to specific types of applications, one of which is image representation and processing. One of these methods run-length encoding where succesive occurrences of a symbol is replaced by a single symbol along with its frequency. The other is difference mapping, in which the image is represented as an array of differences in brightness (or color) between adjacent pixels rather than the brihtness value themselves. Semantic dependent data compression is also of interest in business data processing. The runs of zeros can compressed using runlength encoding. Semantic independent data compression techniques are based on the probabilities of messages in the ensemble. Some are static codings that are Shannon - Fano coding, static Huffman coding, universal coding and arithmetic coding. The other is dynamic one that is dynamic Huffman coding. Huffman coding takes the probabilities of source letters and constructs a full binary tree. Initially, mere is a set of singleton trees, each having the xi probability of a source letter. At each step, the method merges the two smallest probabilities into a tree whose probability is the total of two probabilities and whose root has two children mat are two probabilities. The processed is continued until one tree whose probability is 1 is left The codewords are assign to each message following the path from root to leaves, such as right child meaning a 0 and left child meaning a 1. Adaptive Huffman coding determines the mapping from source messages to codewords on the basis of a running estimate of the source message probabilities. The code is adaptive and requires only one pass over the data The encoder learns the characteristics of the source. The decoder must learn along with the encoder by contiunahy updating the Huffman tree in order to stay in syncronization with the encoder.

In this study, a variety of data compression techniques for ECG signals are presented. The study especially concentrates on Huffman DCT -based entopy coding algorithm. The electrocardiogram (ECG) is a graphic representation of the electrical activity of the heart ECG is recorded from the body surface and it assists an experienced interpreter in giving a prognosis. Compression of digital ECG signals has been a research due to the developments in the techniques of digital processing of analog signals. By the usage of digital signal processing to interpret, store or transmit ECG signals, a need to have memory chips with large capacities has arisen This makes the compression of digital ECG signals a necessity. The discrete cosine transform (DCT) was first applied to image compression in Ahmed, Natarajan, and Rao's pioneering work, in which they showed mat this particular transform was very close to the KLH ( Karhulenen-Loeve-Hotelling) transform, a transform mat produces uncorrelated coefficients. Decorrelation of the coefficients is very important for compression, because each coefficient can men be treated independently without loss of compression efficiency. The DCT is an orthogonal transform having some of the features of a transformation to the frequency domain It is equivalent to a discrete Fouirer transform (OFT) of a symmetricized extension of the input set of samples and fast algorithms are available for efficiently computing the DCT. The DCT has an advantage over the DFT in that it is a purely real transform if the input vector is real The one dimensional DCT is defined as; Cx(0) = ^Z*(») Cx(k) = jf^m^2"^ l)k te=U, >N-1 Where C*(k) is the k* DCT coefficient VUl Data compression is the process of transforming a series of bytes into a new string mat contains the same information but whose length is as small as possible. Data compression has important applications in the areas of information storage and data communication. Most computer applications nedd a large amount of data storage. Also, computer communication networks transfer large volumes of data over commumcation lines. Compressing data to be stored or transfered reduces the cost of storage and transmissioa When files are stored in compressed form, the capacity of storage medium is approximately doubled. Smilarry, when the amount of data to be transmitted is reduced, the capacity of commumcation channel is increased. Thus, data compression provides an echonomical and indispensible option to store and transmit the informatioa Data compression may be achieved in either software or hardware. When it is hardware, a compressor a hardware device that reads a series of of bytes of data and converts it into a shorter string; and decompressor is a device that converts the compressed data back to its original form. When data compression is done in software, a compression algorithm does the former, and an expansion or decompression algontm does the latter. The logic that enables data compression is that the same information retained in the file after some bytes are replaced with shorter representations. In other words, data compression is achieved by removing the redundancy in the file. There are several types of redundancy found in the files. These categories are not independent, but overlap to some extend. The major redundancy types are data distribution, data repetition, high usage patterns and positional redundancy. In typical files, some bytes occur more frequently than others. Moreover, some bytes may never be used. Thus, a few bits on the average might be saved in the representation of bytes, using less number of bits for most frequently used bytes and more number of bits for least frequently used bytes. In some files, long repetitions of a single value occurs. These files can be encoded more compactly than by just repeating, the value symbol. Examples, strings of zeros. In general, data compression techniques can be divided into two categories, the nonreversible and reversible. In nonreversible techniques, the size of the physical representation of data is reduced while a subset of the information is preserved. For example, in some data sets 'leading zeros" may be considered as irrevelant informatioa Note that these techniques are, by definition, on the semantics of the data IX In reversible techniques, all of the information is considered to be relevant, and compression/decompression process recovers Ike original data The reversible procedures can further divided into two groups, semantic independent and semantic dependent techniques. Semantic dependent techniques can be used on any data with varying degrees of effectiveness. They do not use any information regarding the information content of the data. On the other hand, semantic dependent techniques depend on the context and semantics of the data to provide for redundancy reduction A code is a mapping of source messages into codewords. Codes can be categorized as block-block, block-variable, variable-block or variable block, where block block indicates that the source messages and codeword are of fixed length and variable-variable codes map variable-length source messages into variable-length codewords. Data compression schemes are also classified as either static or dynamic. A static method is one which the mapping from the set of messages to be set of codewords is fixed before transmission begins, so mat a given message is represented by the same codeword every time it appears in the message ensemble. In these methods, the mapping is based on the probabilities with which the source messages appear in the source ensemble. Messages that appear frequently are represented by short codewords; messages with smaller probabilities map to longer codewords. These probabilities are determined before transmission begins. A code is dynamic if the maping from the set of messages to the set of codewords changes over time. These methods involve computing an approximation to the probabilities of occurrence, as the ensemble is being transmitted. The assignment of codewords to messages is based on the values of the relativi frequencies of occurrence at each point in time. A message may be represented by a short codeword early in transmission because it occurs frequently at the begining of the ensemble, even though its probability of occurence over the total ensemble is low. Later, when the more probable messages begin to occur with higher frequency, the short codeword will be mapped to one of the higher probability messages, and it will be mapped to a longer codeword. Dynamic codes are also referred to in the literature as adaptive, in mat they adapt to changes in ensemble characteristics over time. All of the adaptive methods are one-pass methods; only one scan of the ensemble is required. Static methods require two passes; one pass to compute probabilities and determine the mapping, and a second pass for transmission Thus, as long as the encoding and decoding times of an X adaptivehod are not substantially greater Ihan those of a static method, the fact that an initial scan is not needed implies a speed improvement in the adaptive case, hi addition, the mapping determined in the first pass of a static coding method must be transmitted along with compressed ensemble by the encoder to the decoder. There are two dimensions along which the data compression schemes may be measured: algorithm complexity and amount of compression When data compression is used in a data transmission application, the goal is speed. Speed of transmission depends on me number of bits sent, the time required for the encoder to generate the coded message, and the time required for the decoder to recover the original ensemble, hi a data storage application, altough the degree of compression is the primary concern, it is nonetheless necessary that the algorithm be efficient in for the schemes to be practical. For a static scheme, mere are three algoritms to analyse: the map construction algorithm, the encoding algorithm, and the decoding algorithm. For a dynamic method, there are just two algorithms: the encoding algorithm and the decoding algorithm Several common measures of compression have been suggested: average message length and compression ratio. Average message length is clearly the average length of a coded message. Compression ratio is defined in several ways. The ratio of the length of the coded message to the length of the original ensemble gives the most common meaning. Semantic dependent data compression techniques are designed to respond to specific types of applications, one of which is image representation and processing. One of these methods run-length encoding where succesive occurrences of a symbol is replaced by a single symbol along with its frequency. The other is difference mapping, in which the image is represented as an array of differences in brightness (or color) between adjacent pixels rather than the brihtness value themselves. Semantic dependent data compression is also of interest in business data processing. The runs of zeros can compressed using runlength encoding. Semantic independent data compression techniques are based on the probabilities of messages in the ensemble. Some are static codings that are Shannon - Fano coding, static Huffman coding, universal coding and arithmetic coding. The other is dynamic one that is dynamic Huffman coding. Huffman coding takes the probabilities of source letters and constructs a full binary tree. Initially, mere is a set of singleton trees, each having the xi probability of a source letter. At each step, the method merges the two smallest probabilities into a tree whose probability is the total of two probabilities and whose root has two children mat are two probabilities. The processed is continued until one tree whose probability is 1 is left The codewords are assign to each message following the path from root to leaves, such as right child meaning a 0 and left child meaning a 1. Adaptive Huffman coding determines the mapping from source messages to codewords on the basis of a running estimate of the source message probabilities. The code is adaptive and requires only one pass over the data The encoder learns the characteristics of the source. The decoder must learn along with the encoder by contiunahy updating the Huffman tree in order to stay in syncronization with the encoder.

##### Açıklama

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 1995

Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 1995

Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 1995

##### Anahtar kelimeler

Elektrokardiyografi,
Entropi,
Kodlama,
Veri sıkıştırma,
Electrocardiography,
Entropy,
Coding,
Data compression