LEE- Müzik-Doktora

Bu koleksiyon için kalıcı URI

http://hdl.handle.net/11527/19408

Gözat

Multipart music transcription using deep neural networks

(Graduate School, 2025-04-17) Germen, Emin ; Karadoğan, Can ; 409072004 ; Music

This research presents a comprehensive framework for automatic music transcription, specifically designed to replicate the auditory capabilities of a "trained ear" in identifying and interpreting complex musical interactions. Traditional Turkish instruments, Qanun and Oud, are used as the focal point of this study, addressing challenges associated with polyphonic music transcription in non-Western musical contexts. Using a foundational corpus and advanced machine learning models, the research aims to bridge the gap between traditional auditory analysis and contemporary computational approaches. The study emphasizes the importance of crafting a robust yet basic corpus capable of simulating essential auditory tasks while capturing the unique timbral and harmonic characteristics of these instruments. A pivotal aspect of this research is the development of a specialized corpus designed to emulate the core perceptual abilities of a trained ear. The corpus incorporates systematic combinations of musical notes played by Qanun and Oud, including sustained tones, chromatic sequences, and randomized patterns. These combinations simulate a wide spectrum of musical scenarios, encompassing monophonic and polyphonic textures as well as overlapping harmonic interactions. Despite its basic design, the corpus provides a detailed representation of the dynamic interplay between the two instruments, enabling computational models to learn critical aspects of pitch recognition, timbral distinction, and harmonic understanding. The corpus generation process begins with the systematic recording of individual notes and their combinations. Each recording captures the transient and sustained qualities of the Qanun and Oud, highlighting their contrasting timbres, the bright and resonant sound of the Qanun versus the dark and mellow tone of the Oud. This structured approach ensures that the data set reflects real-world auditory challenges, such as identifying simultaneous pitches and distinguishing between overlapping harmonic structures. Inclusion of randomized patterns introduces an element of variability, further improving the corpus' ability to mimic real-world musical performances. To analyze and transcribe the complex interactions captured in the corpus, a Deep Neural Network (DNN) and a Convolutional Neural Network (CNN) were developed. These models are trained using a carefully curated feature set, including the Short-Time Fourier Transform (STFT), Constant-Q Transform (CQT), Spectral Centroids (SC), and Band Energy Ratio (BER). Each characteristic contributes to a holistic representation of audio signals, capturing their temporal, spectral, and energetic characteristics. The integration of these features enables the models to extract meaningful insights from the data, such as note onset times, harmonic structures, and timbral nuances. The DNN architecture consists of six layers, each optimized to handle the multidimensional nature of the input data. Its ReLU activation functions and softmax output layer allow the model to classify 37 distinct musical notes across three octaves. Meanwhile, the CNN model leverages its convolutional layers to analyze spectrogram images, offering an alternative approach to learning musical patterns. The CNN architecture is particularly effective in identifying visual representations of audio signals, such as pitch contours and harmonic structures, making it a valuable complement to the DNN. Transient detection and onset analysis are critical components of this framework, providing the temporal precision necessary for accurate music transcription. Transients, characterized by rapid changes in amplitude and frequency, mark the beginning of new sound events, such as the attack phase of a note. Onset analysis further refines this process by pinpointing the exact start times of these events, enabling the models to capture intricate rhythmic and melodic details. In traditional instruments such as the Ud and Kanun, the acoustically limited sustain durations often led to a misinterpretation by the model, where sustained notes—despite being musically longer—were incorrectly classified as rests. To address this issue, a heuristic method was developed. By utilizing data-driven statistical analysis, the time segments misclassified as rests were reinterpreted to better align with plausible note durations. As a result, note lengths were represented more realistically, leading to a notable improvement in the overall transcription accuracy. The proposed framework has shown significant success in transcribing two-part music played by Qanun and Oud, achieving high accuracy in pitch and timbral recognition. The corpus, though basic in construction, has proven effective in capturing the essential harmonic and melodic characteristics of these instruments. This foundational work provides a solid foundation for further advancements in the transcription of more complex musical frameworks, such as Maqam music, which features intricate microtonal scales. Research has broad implications for the fields of musicology, auditory science, and machine learning. By bridging traditional musical practices with modern computational tools, the framework contributes to the development of culturally informed auditory systems, advancing the field of automatic music transcription. Furthermore, the corpus and models developed in this study can serve as valuable resources for musicians, educators, and researchers, fostering a deeper understanding of diverse musical traditions and enhancing the accessibility of non-Western music in digital formats. This study demonstrates the potential of combining basic corpus design with advanced machine learning techniques to achieve robust and accurate music transcription. By focusing on Qanun and Oud, the research highlights the importance of culturally specific datasets in addressing the unique challenges of non-Western music transcription. The proposed framework not only replicates the critical auditory capabilities of a trained ear, but also provides a scalable foundation for future research in complex musical systems. Through this work, significant progress has been made in bridging the gap between traditional auditory analysis and modern computational approaches, offering new avenues for exploring and preserving the rich diversity of the global musical heritage.

Gözat

Konu "derin öğrenme" ile LEE- Müzik-Doktora'a göz atma

Sayfa başına sonuç

Sıralama Seçenekleri