Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems
Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems
dc.contributor.advisor | Kırcı, Mürvet | |
dc.contributor.author | Yazgaç, Bilgi Görkem | |
dc.contributor.authorID | 504162208 | |
dc.contributor.department | Electronics Engineering | |
dc.date.accessioned | 2025-02-05T12:53:37Z | |
dc.date.available | 2025-02-05T12:53:37Z | |
dc.date.issued | 2023-10-24 | |
dc.description | Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2023 | |
dc.description.abstract | This thesis dissertation aims to explore the application of the Fractional Order Calculus (FOC) framework in addressing contemporary problems in audio signal processing. One crucial aspect of present audio signal processing approaches is their reliance on large amounts of data, which necessitates appropriate tools for increasing amount of data. Another important aspect is in relation to the methods used to produce inference models. The neural network approaches dominating the field often require optimization of a large number of parameters. As a result, digital signal processing (DSP) tools are being repurposed to reduce the parameters of neural models. The introductory chapter provides an overview of the dissertation's purpose, which is to investigate whether FOC can provide novel methods to solve problems in neural network-based audio signal classification and reconstruction. Chapter 2 introduces the FOC framework by explaining its capabilities and complexities. While the complexities of FOC have often caused it to be overlooked in engineering applications, its capabilities have attracted the interest of many researchers in various fields, including audio processing, time series estimation, and image enhancement. Providing examples of FOC based applications on audio signal processing, this chapter aims to provide fammiliarity to the FOC concept. The dissertation is structured such that each chapter focuses on a specific application of audio signal processing. Chapter 3 tackles the problem of audio classification, which is categorised by being speech, music or environmental sound signals. Due to the limited availability of data for environmental sound signals, data augmentation methods remain crucial for Environmental Sound Classifaciton (ESC) problems. The chapter presents three FOC based data augmentation methods: Fractional Order Mask, Fractional Order Frequency Scale, and Fractional Order Mel Scale. Fractional Order Mask and Fractional Order Mel Scale methods are applied to Mel Spectrogram and Log-Mel Spectrogram representations of envrionmental sound data. Experiments on ESC problem with neural architectures demonstrate their effectiveness as data augmentation tools in improving the accuracy of neural network models. The findings indicate that employing a data augmentation procedure in combination with the proposed methods can yield a boost of approximately 7.7% in performance for a 5-layer CNN when Log-Mel Spectrograms are used as input. Similarly, the augmented dataset resulted in a increase of over 9% in performance for a 18-layer ResNet. Chapter 4 delves into audio synthesis and its importance in reconstructing time domain representations of audio signals. The history of vocoding methods and their relation to signal reconstruction approaches are discussed. The chapter focuses on phase reconstruction with methods such as SPSI and spectral consistency based iterative methods such as the Griffin-Lim Algorithm (GLA) and its novel forms. In this chapter a FOC based method is proposed. The FOC based method models a signal's Power Spectral Density Function (PSDF) using Fractional Differential Equations (FDE), estimating the instantaneous frequency of a peak in a windowed audio spectrum. This method proves effective in phase reconstruction. The results show the usage of FOC framework provided up to 4% better quality than SPSI. The experiments also highlight the proposed method's effectiveness as an initial phase estimator for spectral consistency based iterative methods. Chapter 5 explores the contemporary research topic of Neural Audio Synthesis (NAS). | |
dc.description.degree | Ph.D. | |
dc.identifier.uri | http://hdl.handle.net/11527/26369 | |
dc.language.iso | en_US | |
dc.publisher | Graduate School | |
dc.sdg.type | Goal 6: Clean Water and Sanitation | |
dc.sdg.type | Goal 9: Industry, Innovation and Infrastructure | |
dc.sdg.type | Goal 17: Partnerships to achieve the Goal | |
dc.subject | neural networks | |
dc.subject | sinir ağları | |
dc.subject | audio processing methods | |
dc.subject | ses işleme yöntemleri | |
dc.title | Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems | |
dc.title.alternative | Kesirli mertebeden kalkülüs temelli yeni ses işleme yöntemleri ve bunların sinir ağları üzerinde sınıflandırma ve sentez problemlerine uygulanması | |
dc.type | Doctoral Thesis |