Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems

Yazgaç, Bilgi Görkem

Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems

dc.contributor.advisor	Kırcı, Mürvet
dc.contributor.author	Yazgaç, Bilgi Görkem
dc.contributor.authorID	504162208
dc.contributor.department	Electronics Engineering
dc.date.accessioned	2025-02-05T12:53:37Z
dc.date.available	2025-02-05T12:53:37Z
dc.date.issued	2023-10-24
dc.description	Thesis (Ph.D.) -- Istanbul Technical University, Graduate School, 2023
dc.description.abstract	This thesis dissertation aims to explore the application of the Fractional Order Calculus (FOC) framework in addressing contemporary problems in audio signal processing. One crucial aspect of present audio signal processing approaches is their reliance on large amounts of data, which necessitates appropriate tools for increasing amount of data. Another important aspect is in relation to the methods used to produce inference models. The neural network approaches dominating the field often require optimization of a large number of parameters. As a result, digital signal processing (DSP) tools are being repurposed to reduce the parameters of neural models. The introductory chapter provides an overview of the dissertation's purpose, which is to investigate whether FOC can provide novel methods to solve problems in neural network-based audio signal classification and reconstruction. Chapter 2 introduces the FOC framework by explaining its capabilities and complexities. While the complexities of FOC have often caused it to be overlooked in engineering applications, its capabilities have attracted the interest of many researchers in various fields, including audio processing, time series estimation, and image enhancement. Providing examples of FOC based applications on audio signal processing, this chapter aims to provide fammiliarity to the FOC concept. The dissertation is structured such that each chapter focuses on a specific application of audio signal processing. Chapter 3 tackles the problem of audio classification, which is categorised by being speech, music or environmental sound signals. Due to the limited availability of data for environmental sound signals, data augmentation methods remain crucial for Environmental Sound Classifaciton (ESC) problems. The chapter presents three FOC based data augmentation methods: Fractional Order Mask, Fractional Order Frequency Scale, and Fractional Order Mel Scale. Fractional Order Mask and Fractional Order Mel Scale methods are applied to Mel Spectrogram and Log-Mel Spectrogram representations of envrionmental sound data. Experiments on ESC problem with neural architectures demonstrate their effectiveness as data augmentation tools in improving the accuracy of neural network models. The findings indicate that employing a data augmentation procedure in combination with the proposed methods can yield a boost of approximately 7.7% in performance for a 5-layer CNN when Log-Mel Spectrograms are used as input. Similarly, the augmented dataset resulted in a increase of over 9% in performance for a 18-layer ResNet. Chapter 4 delves into audio synthesis and its importance in reconstructing time domain representations of audio signals. The history of vocoding methods and their relation to signal reconstruction approaches are discussed. The chapter focuses on phase reconstruction with methods such as SPSI and spectral consistency based iterative methods such as the Griffin-Lim Algorithm (GLA) and its novel forms. In this chapter a FOC based method is proposed. The FOC based method models a signal's Power Spectral Density Function (PSDF) using Fractional Differential Equations (FDE), estimating the instantaneous frequency of a peak in a windowed audio spectrum. This method proves effective in phase reconstruction. The results show the usage of FOC framework provided up to 4% better quality than SPSI. The experiments also highlight the proposed method's effectiveness as an initial phase estimator for spectral consistency based iterative methods. Chapter 5 explores the contemporary research topic of Neural Audio Synthesis (NAS).
dc.description.degree	Ph.D.
dc.identifier.uri	http://hdl.handle.net/11527/26369
dc.language.iso	en_US
dc.publisher	Graduate School
dc.sdg.type	Goal 6: Clean Water and Sanitation
dc.sdg.type	Goal 9: Industry, Innovation and Infrastructure
dc.sdg.type	Goal 17: Partnerships to achieve the Goal
dc.subject	neural networks
dc.subject	sinir ağları
dc.subject	audio processing methods
dc.subject	ses işleme yöntemleri
dc.title	Novel fractional order calculus-based audio processing methods and their applications on neural networks for classification and synthesis problems
dc.title.alternative	Kesirli mertebeden kalkülüs temelli yeni ses işleme yöntemleri ve bunların sinir ağları üzerinde sınıflandırma ve sentez problemlerine uygulanması
dc.type	Doctoral Thesis

Dosyalar

Orijinal seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: 504162208.pdf
Boyut:: 4.59 MB
Format:: Adobe Portable Document Format
Açıklama

İndir

Lisanslı seri

Şimdi gösteriliyor 1 - 1 / 1

Ad:: license.txt
Boyut:: 1.58 KB
Format:: Item-specific license agreed upon to submission
Açıklama

İndir

Koleksiyonlar

LEE- Elektronik Mühendisliği-Doktora