Multi-label classification of 12-lead ECG signal using a mixture-of-experts transformer model

Çelik, Atalay

Multi-label classification of 12-lead ECG signal using a mixture-of-experts transformer model

Dosyalar

528211093.pdf (2.1 MB)

Tarih

2025-07-01

item.page.authors

Çelik, Atalay

Yayınevi

Graduate School

Özet

Electrocardiogram (ECG) measures the electrical activity of the heart and is an important indicator for the detection of cardiac abnormalities. In general, ECG signals are measured from 10 different nodes, and 12 different leads are derived from these measurements. This is done to capture activity from different angles of the heart, therefore each lead captures different information about cardiac rhythm. Some diagnosis types can be detected with detailly analyzing these ECG records by looking at the specific characteristics of the signal such as peaks, transitions between peaks. Abnormal patterns in these signals can only be detected by the experts in the domain, although there are still challenges in the implementation phase. Automation of this process has been in the interest of researchers for a long time. Most early work focuses on the peak detection and beat classification tasks. The advancements in machine learning and deep learning have saturated and introduced successful implementation cases which have high accuracy and effectiveness for these tasks. As a more complex and sophisticated problem, automatic detection of the diagnosis is a common task which is being studied. Competitions such as PhysioNet CinC competition targeted this domain for several years and achieved great interest and results. The availability of datasets for ECG records accelerated these research interests. Several great datasets are open-source and available for research purposes. With the advancements in the large language model domain with the new model architecture called transformers becoming more prevalent, new approaches to the problem have emerged. Several studies implement transformer-based models for the ECG classification task. The embedded attention module in the model and high compute capability creates a great potential for signal computation. The usage of these models are still scarce and in development for time-series based data. Deep learning based methods such as long short term memory or gated recurrent units dominate and are commonly used. Even though there are great studies being done with transformer architecture, they mostly focus on forecasting based solutions for fields such as finance and weather forecasting. Another branch of transformer-based models is the mixture-of-expert approach where multiple experts are introduced within the model where the activation of these experts are controlled based on the incoming data. As in the making of this study, literature lacks implementation use cases with this model characteristics. This study aims to implement a mixture-of-experts based transformer model for the classification of ECG records into multiple diagnoses in a multi-label manner. Each record has 10 seconds of ECG record in 500 Hz frequency with demographic features available. There are a total of 26 different diagnosis labels and each record have one or more labels attached to it. These labels are the target which is being predicted. For this study 81926 different labeled ECG records are used from six different datasets. For the preprocessing and outlier analysis of the datasets, a digital signal filtering approach is used with finite impulse filtering and each record is normalized. For the preprocessing steps, different configurations are tested and the most optimal parameter set is chosen according to signal-to-noise ratio. For the outlier analysis, a triple voting system is used with three different methods; Z-score, principal component analysis and isolation forest. The records which receive 2 out of 3 votes are removed from the dataset. Another important step is to extract external features from the ECG records to feed into the model. In this study, several methods are used to extract features such as peaks and offset values. The model is constructed by feeding the signal values from 12 different channels of ECG and the extracted signal features. These features then are concatenated with demographic features. Signals and the features are fed into the model with 1D convolutional layers to enhance the time-dependent features. All of these features are projected into the model. The main model block includes normal encoder blocks with self-attention layers, pre-layer normalizations and skip connections. After three usual encoder blocks, three special encoders with mixture-of-experts blocks are used. This main transformer model attends to important information between time tokens. Mixture-of-experts modules have special gating networks which route the time tokens to different experts depending on their characteristics. The training setup is carefully designed to experiment with different configurations. Each parameter is optimized with a uniform subset of all data. The main model is trained with all data with optimized configuration for 20 epochs with learning rate of 3e-3 with cosine annealing. Batch size of 16 is used with gradient accumulation steps of 16, making the effective batch size as 256. The training runs use dropout, warm up steps and early stopping for effective training. The model inner dimension is set as 96 and the feed-forward network as 384, there are 6 encoders and 6 heads for each attention head. The mixture-of-experts are composed of 4 experts with top-1 routing. For testing and evaluating special metrics for the task is constructed. The trained model on 80 % of the data as a training set is optimized on a 10 % validation set and tested on the 10 % test set. The tests resulted in 59.98 % macro F1 score, 54.17 % exact match score, 55.33 % top-1, 75.22 % top-2, 84,80 % top-3 accuracies and 95.74 % macro AUC-ROC value are achieved on 6 different dataset and 26 diagnosis with multi-label. The model achieves a great result for a difficult scenario of ECG classification task. The task requires a high level of expertise and is a complex problem with different facets.

Açıklama

Thesis (M.Sc.) -- Istanbul Technical University, Graduate School, 2025

Konusu

Electrocardiogram (ECG), Elektrodiyagram (EKG), transformers, dönüştürücüler, machine learning, makine öğrenmesi, deep learning, derin öğrenme

URI

http://hdl.handle.net/11527/27929

Koleksiyonlar

LEE- Büyük Veri ve İş Analitiği Yüksek Lisans

Detay Görünüm

Multi-label classification of 12-lead ECG signal using a mixture-of-experts transformer model

Dosyalar

Tarih

item.page.authors

Süreli Yayın başlığı

Süreli Yayın ISSN

Cilt Başlığı

Yayınevi

Özet

Açıklama

Konusu

Alıntı

URI

Koleksiyonlar

Endorsement

Review

Supplemented By

Referenced By