Elektrokardiyogram verilerinin iyileştirilmiş yapay arı kolonisi (MABC) algoritması ile analizi

Dilmaç, Selim

Elektrokardiyogram verilerinin iyileştirilmiş yapay arı kolonisi (MABC) algoritması ile analizi

dc.contributor.advisor	Ölmez, Tamer
dc.contributor.author	Dilmaç, Selim
dc.contributor.authorID	10180405
dc.contributor.department	Elektronik Mühendisliği
dc.contributor.department	Electronics Engineering
dc.date	2017
dc.date.accessioned	2020-09-21T12:48:03Z
dc.date.available	2020-09-21T12:48:03Z
dc.date.issued	2017
dc.description	Tez (Doktora) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2017
dc.description	Thesis (Ph.D.) -- İstanbul Technical University, Institute of Science and Technology, 2017
dc.description.abstract	Elektrokardiyogram (EKG) işaretlerinin analizi, çeşitli kalp hastalıklarının teşhisinde yaygın olarak kullanılmaktadır. Normalden farklı vurular olan aritmiler, ritm bozukluğu – düzensizliği anlamına gelmektedir. Hastane ortamında kısa süreli alınan kayıtlarda ortaya çıkmayan vuru düzensizliklerinin belirlenmesi amacıyla uzun dönemli elektrokardiyogram kayıtları alınmaktadır. Bir gün süreli bir kayıtta yüz binin üzerinde vuru yer alabilmektedir. Bu şekilde çok yüksek sayıda vurunun kardiyologlar tarafından incelenmesi zahmetli ve zaman alıcı bir işlem olduğu için, bilgisayar destekli, otomatik EKG analizi yapan yöntemler geliştirilmiş ve geliştirilmeye devam etmektedir. Normal ve aritmik vuru tiplerinin analizinde çeşitli sınıflama yöntemleri kullanılmaktadır. Sınıflama, öznitelik uzayındaki örnek vektörlerinden, sınıf etiketleri kümesine haritalama yapan, denetimli (supervised) bir sistemdir. Temel olarak, önceden tanımlı sınıf etiketlerini, örnek örüntülere atama işlemini gerçekleştirmektedir Sınıflama yöntemleri arasında uzun zamandır kullanılmakta olan k-en yakın komşu (K-NN), en yakın ortalama (NMC) gibi çeşitli klasik yöntemler olduğu kadar, son yıllarda geliştirilen yeni yöntemlerin de olduğu görülmektedir. Bunlara örnek olarak, yapay sinir ağlarını esas alan öz-düzenlemeli harita (SOM) yöntemi ve karınca kolonisi optimizasyonu (ACO), yapay arı kolonisi (ABC) gibi doğadan esinlenen (nature inspired) yöntemler verilebilir. Geliştirilen yöntemlerin performanslarının değerlendirilmesi amacıyla standart bir kaynak olarak çoğu defa Massachusetts Institute of Technology – Beth Israel Hospital (MIT-BIH) veri tabanı kullanılmaktadır. Tez kapsamında da kullanılan bu veri tabanı, PhysioNet tarafından açık kaynak olarak sunulmaktadır. Yapay zekâ algoritmalarının bir alt kümesi olan doğadan esinlenen yöntemler, son birkaç on yıllık dönemde yaygın bir şekilde kullanılmaktadır. Bal arısı kolonilerinin toplumsal davranışlarından, özellikle gıda kaynağı araştırma ve yararlanma sürecinden esinlenerek, yapay arı kolonisi algoritması geliştirilmiştir. Gerçek hayattaki çok çeşitli mühendislik problemlerinin çözümünde kullanılan yapay arı kolonisi algoritmasının, sınıflama amaçlı öbekleme probleminde de kullanılmasına yönelik çalışmalar literatürde mevcuttur. Ancak bu tez çalışmasına kadar, ABC algoritmasının EKG verilerinin analizine yönelik kullanımı bulunmamaktadır. Yapay arı kolonisi algoritmasının EKG verilerinin analizi ve kalp vurularının sınıflamasında kullanılırken iyileştirmelere ihtiyaç duyulduğu görülmüştür. Yapılan düzenlemelerle yöntemin performansı arttırılmış, elde edilen bu yönteme "İyileştirilmiş ABC (MABC)" adı verilmiştir. Orijinal ABC algoritmasındaki en iyi çözümü arama sürecinde kullanılan karar ölçütü, MABC algoritmasında sınıflama başarımını arttıracak şekilde iyileştirilmiştir. Ayrıca, yerel iyi çözümlere takılmadan global en iyi çözümün bulunabilmesini sağlamak için kullanılan işçi arı - izci arı dönüşüm mekanizmasına, izci arı dönüşüm eşiği oranı (SCTR) kontrol parametresi eklenmiştir. Bu sayede o ana kadar bulunmuş olan iyi çözümlerin hata ile terk edilmesi önlenmiştir. Kullanılacak olan öznitelik kümesinin belirlenmesi amacıyla, elektrokardiyogram işaretinin başlıca özelliklerinden hareketle zaman domeninde 38, frekans domeninde 60 farklı öznitelik çıkartılmıştır. Bunların içinden en yüksek ayırt ediciliğe sahip olan öznitelikler, diverjans analizi yöntemi ile belirlenmiş ve toplam 15 öznitelik kullanılmıştır. Tez çalışması kapsamında geliştirilen MABC algoritmasının performansının, eğitim aşamasında kullanılan örnek vektörlerinden bağımsız olduğu, K-Katlı çapraz doğrulama tekniği ile gösterilmiştir. Veri kümesindeki optimum öbek sayısı ise, geçerlilik indisleri analizi ile incelenmiş, sekiz sınıfa ait vuru tipleri için, sekiz öbek olarak doğrulanmıştır. MABC algoritması tabanlı sınıflayıcı EKG verilerinin analizine uygulandığında, yüksek sınıflama başarımı hızlı bir şekilde elde edilmiştir. Tez çalışmasında MIT-BIH veri tabanındaki "N, j, V, F, f, A, a ve R" tipi olmak üzere toplam sekiz farklı vuru tipi sınıflandırılmıştır. Sınıflama başarımının arttırılması amacıyla, zaman ve frekans domenine ait öznitelikler birlikte kullanılmıştır. Geliştirilen yöntem, EKG verileri analizi dışında, çeşitli diğer veri kümelerine de uygulanmış, olumlu sonuçlar elde edilmiştir. Bu şekilde tez çalışması kapsamında, hızlı çalışan ve yüksek sınıflama başarımına sahip bir genel amaçlı sınıflayıcı elde edilmiştir. Geliştirilmiş olan MABC algoritmasının, diğer araştırmacıların kolayca kullanmasını sağlamak amacıyla bir grafik kullanıcı arabirimi (Graphic User Interface – GUI) hazırlanmıştır.
dc.description.abstract	One of the most commonly used tools to diagnose the heart diseases is electrocardiogram (ECG signal) analysis. An abnormal heart rhythm is called an arrhythmia. Arrhythmias are caused by problems in heart's electrical system and are indication of some heart diseases. Some of the arrhythmias appear on the electrocardiogram under certain conditions such as effort, excitement etc. during the daily life. In order to detect those rare arrhythmic events, minimum 24 hours long term records must be analysed. However, manual analysis of long records, visual inspection by a cardiologist is too much time consuming and is not practical. Heart beats can have different characteristics on different patients. Because of that reason, there may be an overlap between heart beat types (classes). As a result, arrhythmia classification becomes harder. To make easier and accurate analysis, computer aided heart beat classification system development is an active research topic. ECG signal consists of certain parts, such as P-wave, QRS complex and T-wave. Characteristics of those waves, such as morphology, area, length, amplitude and time intervals, are different on normal beats and arrhythmic beats. Those characteristics are called time space features. In addition to them, there are some other characteristics of ECG signal in frequency domain, called frequency space features. Some of the arrhythmic beats can be classified by using only time domain features, however some other type beats need both time and frequency domain features. Using more number of features can help to have higher classification accuracy. However, an increasing number of dimensions in feature space would results in much higher computational time. In order to have the optimum number of features, feature reduction methods can be implemented, such as divergence analysis. In this thesis, ECG signals are taken from the MIT-BIH arrhythmia database, which is developed by the Massachusetts Institute of Technology and ECG recordings obtained by Beth Israel Hospital Arrhythmia Laboratory. In the beginning of the electrocardiogram signal analysis process, a software code is developed to calculate the time domain features. Total 38 time domain features are reduced to 7 features by using divergence analysis. As a second step, discrete Fourier transform (DFT) of a window of data which has 256 samples around the R-peak is used to calculate the DFT coefficients. About 60 of them have non-zero values. They are reduced to 8 frequency domain features by using divergence analysis. In this study, total 15 time and frequency domain features are used for the classification of the heart beat types. Those feature vectors represents the hearts beats to be analysed and classified. Before the classification process, data set which consists of feature vectors should be divided into two subsets, as training set and test set. Training set is used to train the classification algorithm, while test set is used to measure the performance of the classifier. In order to verify that the performance of classification algorithm is independent of the training set, cross validation techniques can be used. In this thesis, K-Fold cross validation method is used. Classification is a supervised system, which is a mapping method of assigning pre-defined labels onto objects in feature space. A similar term clustering is an unsupervised learning method that autonomously partitions the data into clusters. Most of the classification methods use clustering as a first stage of classification process. Various classification methods are used in the literature. In addition to the classical methods such as nearest mean classifier, k-means, k-nearest neighbor, neural network classifier etc, there are many new proposed nature inspired algorithms such as ant colony optimization (ACO) and artificial bee colony (ABC) algorithm. Nearest mean classifier (NMC) and neural network (NN) classifier are widely used classical methods in data classification problem. NMC is a fast algorithm which uses the similarity between patterns to make the classification. A self-organizing map (SOM) is a type of artificial neural network (ANN). SOM network is trained using unsupervised learning to produce a low-dimensional representation of the input space of the training samples, which is called a map. By using the class information of output nodes in the map, classification of unknown patterns can be performed. K-nearest neighbour (KNN) is fast and simple classification method. It uses all training samples and makes the classification of test samples, based on majority vote of its neighbours in the training set, considering similarity measure. One of the most commonly used distance metric is Euclidean distance. Nature inspired algorithms may imitate a process in nature or may use an approach simulating some intelligent behaviors of social animal groups in nature such as bird flocks, ant or bee colonies. These kinds of algorithms are called swarm intelligence (SI) algorithms. Ant colony optimization algorithm is a meta-heuristic, nature inspired method. This algorithm is a multi-agent system, inspired by the collective behavior of ant colonies. It makes the clustering by simulating the interactions between ants through pheromones. By choosing the shortest distance between the nodes, clusters are formed. After that, K-NN can be used for classification phase. ABC algorithm was first proposed by Karaboga, inspired by foraging behavior of honey bee colony. This algorithm has been used in many application areas as well as data clustering. However, until this thesis, there is no published study about using ABC on electrocardiogram analysis. ABC is a simple and flexible algorithm, uses few control parameters. It can converge to the global best solution because of its random search mechanism. Honey-bee colony's foraging behaviour, learning, memorizing and information sharing characteristics are the main reasons of the success of this algorithm. In real life, forager bees randomly search food locations. They evaluate the nectar amount in a food source, memorize it and share this location and food quality information with other colony members in the hive, by using a special communication method. ABC algorithm is developed based on a model inspired by foraging behaviour of honey bee swarm. Possible solutions of a problem in artificial bee colony are represented by food source locations around the hive in real life. Nectar amount in that food source represents the fitness value of that solution. There are three types of foraging bees in ABC: scout bees, employed bees and onlooker bees. Scout bees search new solutions in the solution space randomly, without using any pre-information. It becomes an employed bee when they start to work on a location. Employed bees calculate the fitness of the solution, memorize it and share it with onlooker bees. Number of the employed bees in the hive, represents the number of solutions in ABC algorithm. Onlooker bees receive the information from employed bees; select a location to exploit, depending of the fitness values of the locations. The solutions with higher fitness value are visited by more onlooker bees, because of the probabilistic model in ABC. Each onlooker bee searches new locations near the selected employed bee. Both employed and onlooker bees uses greedy selection mechanism to decide which location is better among the previous and next locations. If there is no improvement on the fitness of a location during the last "limit" times iterations, the position will be abandoned. That employed bee becomes a scout bee, starts search from a random location. This process enables the algorithm to search global best location, by avoiding converge to local minima. The process explained above is repeated by a pre-defined iteration cycles, called maximum cycle number (MCN). In classification problem, ABC algorithm performs the clustering during the training phase and provides the best cluster center locations. In the test phase, classification is done by assigning class label to the test sample, according to the shortest Euclidean distance between that sample and the cluster centers. It has been seen that the algorithm should be improved in the studies conducted within the scope of the thesis. For that purpose, below listed modifications are done. New modified version of algorithm is called Modified ABC (MABC) algorithm. As an objective function in the fitness value, maximizing the classification accuracy is used, instead of minimizing the sum of distances of the training samples to their cluster centres. Otherwise, when there are much more number of samples in one class than other classes, more than one cluster center approaches towards that class. This problem results in drop of classification accuracy. A new control parameter called SCTR (Score Conversion Threshold Ratio) is introduced in the algorithm. Otherwise, good solutions, even the best solution, can be abandoned if there is no improvement in the last "limit" time iterations. In the initialisation stage, the algorithm starts search process from randomly selected training sample vectors instead of random locations. Also, in search space border definitions have been done. High classification accuracy has been achieved in a short time, when the MABC algorithm is applied on electrocardiogram data analysis. Examined MIT-BIH data set consists of total eight heart beat types "N, j, V, F, f, A, a and R". Within the scope of the thesis study, a training data set with a total of 2834 sample vectors was used. A total of 8735 test sample vectors belonging to eight different classes were classified using the obtained cluster center coordinates. The classification performance achieved by the MABC algorithm is 97.18%. With this result, MABC algorithm is among the best performing algorithms. MABC algorithm has also been applied to other data sets, other than ECG data. High classification accuracy and successful results have been obtained. In this way, within the scope of the thesis study, a general purpose classifier has been achieved which has a high classification performance and fast working. A graphical user interface (GUI) has been prepared so that the developed MABC algorithm can be easily used by other researchers. In future studies, it is advisable to develop a hybrid classification method which can be used together with the MABC algorithm and other classification methods.
dc.description.degree	Doktora
dc.description.degree	Ph.D.
dc.identifier.uri	http://hdl.handle.net/11527/18503
dc.language.iso	tur
dc.publisher	Fen Bilimleri Enstitüsü
dc.publisher	Institute of Science and Technology
dc.rights	Kurumsal arşive yüklenen tüm eserler telif hakkı ile korunmaktadır. Bunlar, bu kaynak üzerinden herhangi bir amaçla görüntülenebilir, ancak yazılı izin alınmadan herhangi bir biçimde yeniden oluşturulması veya dağıtılması yasaklanmıştır.
dc.rights	All works uploaded to the institutional repository are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Elektrokardiyografi
dc.subject	Kalp Hastalıkları Tanısı
dc.subject	Electrocardiography
dc.subject	Heart Diseases Diagnosis
dc.title	Elektrokardiyogram verilerinin iyileştirilmiş yapay arı kolonisi (MABC) algoritması ile analizi
dc.title.alternative	Analysis of electrocardiogram data by using modified artificial bee colony (MABC) algorithm
dc.type	Doctoral Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 496430.pdf
Boyut:: 7.51 MB
Format:: Adobe Portable Document Format

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 3.16 KB
Format:: Plain Text
Açıklama

İndir

Koleksiyonlar

FBE- Elektronik Mühendisliği Lisansüstü Programı - Doktora