Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems

dc.contributor.advisor Kırcı, Mürvet
dc.contributor.author Yazgaç, Bilgi Görkem
dc.contributor.authorID 504162208
dc.contributor.department Electronics Engineering
dc.date.accessioned 2025-02-05T12:53:37Z
dc.date.available 2025-02-05T12:53:37Z
dc.date.issued 2023-10-24
dc.description Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2023
dc.description.abstract This thesis dissertation aims to explore the application of the Fractional Order Calculus (FOC) framework in addressing contemporary problems in audio signal processing. One crucial aspect of present audio signal processing approaches is their reliance on large amounts of data, which necessitates appropriate tools for increasing amount of data. Another important aspect is in relation to the methods used to produce inference models. The neural network approaches dominating the field often require optimization of a large number of parameters. As a result, digital signal processing (DSP) tools are being repurposed to reduce the parameters of neural models. The introductory chapter provides an overview of the dissertation's purpose, which is to investigate whether FOC can provide novel methods to solve problems in neural network-based audio signal classification and reconstruction. Chapter 2 introduces the FOC framework by explaining its capabilities and complexities. While the complexities of FOC have often caused it to be overlooked in engineering applications, its capabilities have attracted the interest of many researchers in various fields, including audio processing, time series estimation, and image enhancement. Providing examples of FOC based applications on audio signal processing, this chapter aims to provide fammiliarity to the FOC concept. The dissertation is structured such that each chapter focuses on a specific application of audio signal processing. Chapter 3 tackles the problem of audio classification, which is categorised by being speech, music or environmental sound signals. Due to the limited availability of data for environmental sound signals, data augmentation methods remain crucial for Environmental Sound Classifaciton (ESC) problems. The chapter presents three FOC based data augmentation methods: Fractional Order Mask, Fractional Order Frequency Scale, and Fractional Order Mel Scale. Fractional Order Mask and Fractional Order Mel Scale methods are applied to Mel Spectrogram and Log-Mel Spectrogram representations of envrionmental sound data. Experiments on ESC problem with neural architectures demonstrate their effectiveness as data augmentation tools in improving the accuracy of neural network models. The findings indicate that employing a data augmentation procedure in combination with the proposed methods can yield a boost of approximately 7.7% in performance for a 5-layer CNN when Log-Mel Spectrograms are used as input. Similarly, the augmented dataset resulted in a increase of over 9% in performance for a 18-layer ResNet. Chapter 4 delves into audio synthesis and its importance in reconstructing time domain representations of audio signals. The history of vocoding methods and their relation to signal reconstruction approaches are discussed. The chapter focuses on phase reconstruction with methods such as SPSI and spectral consistency based iterative methods such as the Griffin-Lim Algorithm (GLA) and its novel forms. In this chapter a FOC based method is proposed. The FOC based method models a signal's Power Spectral Density Function (PSDF) using Fractional Differential Equations (FDE), estimating the instantaneous frequency of a peak in a windowed audio spectrum. This method proves effective in phase reconstruction. The results show the usage of FOC framework provided up to 4% better quality than SPSI. The experiments also highlight the proposed method's effectiveness as an initial phase estimator for spectral consistency based iterative methods. Chapter 5 explores the contemporary research topic of Neural Audio Synthesis (NAS).
dc.description.degree Ph.D.
dc.identifier.uri http://hdl.handle.net/11527/26369
dc.language.iso en_US
dc.publisher Graduate School
dc.sdg.type Goal 6: Clean Water and Sanitation
dc.sdg.type Goal 9: Industry, Innovation and Infrastructure
dc.sdg.type Goal 17: Partnerships to achieve the Goal
dc.subject neural networks
dc.subject sinir ağları
dc.subject audio processing methods
dc.subject ses işleme yöntemleri
dc.title Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems
dc.title.alternative Kesirli mertebeden kalkülüs temelli yeni ses işleme yöntemleri ve bunların sinir ağları üzerinde sınıflandırma ve sentez problemlerine uygulanması
dc.type Doctoral Thesis
Dosyalar
Orijinal seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.alt
Ad:
504162208.pdf
Boyut:
4.59 MB
Format:
Adobe Portable Document Format
Açıklama
Lisanslı seri
Şimdi gösteriliyor 1 - 1 / 1
thumbnail.default.placeholder
Ad:
license.txt
Boyut:
1.58 KB
Format:
Item-specific license agreed upon to submission
Açıklama