ISTANBUL TECHNICAL UNIVERSITY « GRADUATE SCHOOL OF ARTS AND SOCIAL SCIENCES M.A. THESIS JUNE 2018 A SUBJECTIVE LISTENING TEST ON THE PREFERENCE OF TWO DIFFERENT STEREO MICROPHONE ARRAYS ON HEADPHONES AND SPEAKERS LISTENING SETUPS Mertcan İÇUZ Dr. Erol Üçer Center for Advanced Studies in Music Music Program Dr. Erol Üçer Center for Advanced Studies in Music Music Program JUNE 2018 ISTANBUL TECHNICAL UNIVERSITY « GRADUATE SCHOOL OF ARTS AND SOCIAL SCIENCES A SUBJECTIVE LISTENING TEST ON THE PREFERENCE OF TWO DIFFERENT STEREO MICROPHONE ARRAYS ON HEADPHONES AND SPEAKERS LISTENING SETUPS M.A. THESIS Mertcan İÇUZ (409151114) Thesis Advisor: Asst. Prof. Dr. Taylan ÖZDEMİR Dr. Erol Üçer Müzik İleri Araştırmalar Merkezi Müzik Yüksek Lisans Programı HAZİRAN 2018 İSTANBUL TEKNİK ÜNİVERSİTESİ « SOSYAL BİLİMLER ENSTİTÜSÜ KULAKLIK VE HOPARLÖR DİNLEME DÜZENLERİNDE SEÇİLEN İKİ MİKROFON DİZİLİMİNİN TERCİHİNİN ÖZNEL ÖLÇÜMÜ ÜZERİNE BİR DENEY YÜKSEK LİSANS TEZİ Mertcan İÇUZ (409151114) Tez Danışmanı: Yrd. Doç. Taylan ÖZDEMİR v Thesis Advisor : Asst. Prof. Dr. Taylan ÖZDEMİR .............................. Istanbul Technical University Jury Members : Asst. Prof. Dr. Taylan ÖZDEMİR ............................. Istanbul Technical University Assoc. Prof. Can KARADOĞAN .............................. Istanbul Technical University Asst. Prof. Yahya Burak TAMER .............................. Bahçeşehir University Mertcan İÇUZ, a M.A. student of ITU Graduate School of Arts and Social Sciences 409151114, successfully defended the thesis entitled “A SUBJECTIVE LISTENING TEST ON THE PREFERENCE OF TWO DIFFERENT STEREO MICROPHONE ARRAYS ON HEADPHONES AND SPEAKERS LISTENING SETUPS”, which he prepared after fulfilling the requirements specified in the associated legislations, before the jury whose signatures are below. Date of Submission : 04 May 2018 Date of Defense : 05 June 2018 vi vii To my family, viii ix FOREWORD I would like to thank my advisor Taylan Özdemir for guiding me in shaping my thesis, Pieter Snapper for showing us the professional approach a sound engineer should adapt and being a professional role model for us, Jerfi Aji for teaching us music theory in such an enjoyable manner, our librarian Özlem Gürkan for creating such a wonderful space to study and always giving us sweets and coffee; and all the MIAM community which helped this creative environment exist. I would like to thank to my best friend Gieorgos Karazeris for standing by me in difficult times. Finally, I would like to express my everlasting gratitude to my family and my friends, for giving me the support and opportunity to pursuit my goals. May 2018 Mertcan İÇUZ Sound Engineer x xi TABLE OF CONTENTS Page FOREWORD ............................................................................................................. ix TABLE OF CONTENTS .......................................................................................... xi ABBREVIATIONS ................................................................................................. xiii LIST OF TABLES ................................................................................................... xv LIST OF FIGURES ............................................................................................... xvii SUMMARY ............................................................................................................. xix ÖZET.......................................................................................................................xxi INTRODUCTION .................................................................................................. 1 1. Aim and Method ................................................................................................. 2 1.1 An Overview of Music Recording History ........................................................ 3 1.2 Stereo Recording versus Multi-Microphone Recording .................................... 5 1.3 THEORY ................................................................................................................ 9 2. A Brief Overview of Human Sound Localization System ................................. 9 2.1 An Overview of Stereophonic Recording Techniques ..................................... 10 2.2 ITD versus IID .................................................................................................. 12 2.32.3.1 Localization versus spaciousness .............................................................. 12 2.3.2 Four types of stereo microphone configurations ....................................... 13 2.3.3 Recording angles and angular distortion ................................................... 14 Listening through Headphones versus Speakers .............................................. 15 2.42.4.1 Practical/pragmatic considerations ............................................................ 17 RECORDING AND LISTENING TESTS ........................................................ 19 3. Three Different Listening Setups and Locations .............................................. 19 3.1 Three Different Listening Spaces ..................................................................... 24 3.23.2.2 Listening spaces ........................................................................................ 25 3.2.2 Listening setups and preparation of samples ............................................ 28 RESULTS AND DISCUSSION .......................................................................... 31 4. Listening Nr. 1 – Music Technologies Dept. Studio at İTU State Conservatory ... 31 Listening Nr. 2 – MIAM Studio control room ................................................. 35 4.1 Listening Nr. 3 – DigiLab ................................................................................ 38 4.2 CONCLUSION ..................................................................................................... 41 5. Main Idea .......................................................................................................... 41 5.1 Conclusions ...................................................................................................... 42 5.2 Future Work ..................................................................................................... 46 5.3REFERENCES ......................................................................................................... 47 APPENDICES .......................................................................................................... 51 CURRICULUM VITAE .......................................................................................... 61 xii xiii ABBREVIATIONS ADC : Analog to Digital Converter ALH : Albert Long Hall DIN : Deutsches Institut für Normung HRTF : Head Related Transfer Function IID : Inter-aural/channel Intensity Difference ITD : Inter-aural/channel Time Difference LUFS : Loudness Unit Full Scale ORTF : Office de Radiodiffusion Télévision Française PA : Public Address RMS : Root Mean Square SRA : Stereophonic Recording Angle xiv xv LIST OF TABLES Page Table 3.1 : Listening rooms. .................................................................................... 28 Table 3.2 : LUFS and peak levels of recordings. ................................................... 29 Table 4.1 : Number of subjects for the listening test ............................................ 31 Table 4.2 : Abbreviations used for the analysis of the listening tests .................. 32 Table 4.3 : Results of the listening test at M.T. Studio ......................................... 34 Table 4.4 : Results of the listening test conducted at MIAM studio control room ................................................................................................................ 37 Table 4.5 : Results of the listening test conducted at DigiLab ............................. 40 Table A.1 : The four stereo microphone configurations. ..................................... 51 Table A.2 : Comparison of data from Wittek and recordings with three source positions. ................................................................................................ 54 Table A.3 : Comparison of the data given by Wittek to the measured data for sine and cowbell signals. ....................................................................... 55 Table A.4 : Table Calculated time differences as a result of microphone spacing. ................................................................................................................ 57 Table A.5 : Comparison of near-coincident pair’s (DIN) data ............................ 57 Table A.6 : Listening orders for the test ................................................................ 59 xvi xvii LIST OF FIGURES Page Jecklin Disc (Josephson Engineering) ............................................... 14 Figure 2.1 :Figure 3.1 : Frequency response graph of OM1. ................................................... 21 Figure 3.2 : Frequency response graph of CM3. ................................................... 21 Figure 3.3 : Semplice Quartet rehearsing at Albert Long Hall. .......................... 22 Figure 3.4 : GitarLive performance space. ............................................................ 24 Figure 3.5 : MİAM studio control room. ............................................................... 26 Figure 3.6 : DigiLab ................................................................................................. 27 Figure 4.1 : M.T. Studio listening room ................................................................. 35 Figure 4.2 : ATC SCM200ASL Pro speaker (ATC speakers) ............................. 36 xviii xix A SUBJECTIVE LISTENING TEST ON THE PREFERENCE OF TWO DIFFERENT STEREO MICROPHONE ARRAYS ON HEADPHONES AND SPEAKERS LISTENING SETUPS SUMMARY The main purposes of this thesis are to outline a general theoretical background on which modern commercial stereophonic sound recording is built and to seek further possibilities for stereophonic recording on modern music listening trends. Stereo recording is considered to be exclusive to classical production by many recording engineers and musicians, yet it has its uses in recording of other genres as well. Producing acoustical music via stereo/surround microphone configurations requires a great deal of experience and the sound engineer has to be equipped with appropriate devices (such as qualified microphones, precise monitoring equipment and a suitable post-production room) as well as background information and listening experience on the genre. As opposed to multi-microphone recording process, in which each sound source/instrument may be isolated acoustically and can be recorded in separate times; stereo recording is the mixture of all sound sources’ acoustical output combined with the space the performance takes place. Technically speaking, this type of recording requires many criteria to apply to the norms and criteria set by producers and audience. The engineer has the responsibility to adjust the stereophonic panorama, amount of diffused sound to the direct sound as well as the choice of microphones and other equipment in the signal chain. There are endless combinations of all those criteria mentioned above for the producer and there is not only one correct way to conduct a recording on the selected type of performing ensemble or performance space. Thus, the success and aesthetical appreciation of a sound recording are in many levels of sound engineer’s responsibility. There are four types of stereo microphone configurations: coincident, near-coincident, spaced and baffled pairs. Among them, near-coincident and baffled pairs were chosen for the test examples. When we record a sound source with two microphones and play it back through speakers, there are two types of delays between the two signals: a) inter-channel time/intensity delay added as a result of late arrival of acoustical energy to the microphone which is more distant to the source, and b) inter-aural time difference which is the result of our ears’ spacing, i.e., the sound emitted by the Left speaker arrives our right ear later than our left ear; or vice versa. In headphone playback, we don’t have the inter-aural time difference. One of the factors to be observed at the analysis of results was this difference and its effects on people’s subjective appreciation. Another phenomenon to be observed was the spacing of microphones of the selected pairs and their relation to subjective evaluation of the recordings. The starting point for both the selected pairs was human head, which is the main element in its hearing system. For the DIN (near-coincident) setup, the spacing is 20cm, which is modeled xx close to the average human ear spacing, which is between 17-20cm. Thus when the recording, which is done with DIN configuration, is played back through headphones; there supposed to be a minimum amount of extra inter-aural intensity/time delay added by the recording or playback methods. We can, in that case, say that the microphones transfer the sounds to our ears with the same time delay that we would have if we were listening to the sounds with our own ears in their acoustical environment. In that case, one question emerges: How does the recording that is done with close inter-aural time difference affect a person’s subjective evaluation of recorded sound? For the baffled pair, again a spacing of 17cm is chosen with the human anatomy in mind, however, in that case there is also a baffling disk for adding in the human head’s hampering frequencies in various levels according to the direction of the sound source. Another question to be asked is: how baffle effect transfers to the playback through headphones in regard to audience’s subjective appreciation? In this thesis, the distance of the selected microphone configurations to the sound sources were identical. The varying criteria for the recordings were microphone polar patterns and configuration patterns. Headphones were identical for the listening tests, yet the listening rooms and speaker types varied to reflect different listening conditions that is the result of speaker playback situations, since most listening rooms vary in their sizes, acoustical treatments and playback equipment. For the recording, three places were chosen with different sizes and acoustical treatments. The purpose of the diversification of recording places was to test different ratios of wanted/unwanted signal (noise, reflection) with the different configurations which also varied on polar patterns (with omni-directional pairs receiving more unwanted sounds). Also the sound sources in the recordings (which are music ensembles) varied greatly in their sizes, characteristics of instruments, and timbres. For the listening tests, which was done on three different rooms with varied sizes and type of acoustical treatment, the subjects listened to the two recordings done with the two selected stereo configurations and asked to state their opinion on the subjective appreciation of one. After that, they listened to the two recordings (in the same order, or vice versa), but this time on the other listening medium; e.g., if they listened via headphones in the first time, they would listen to headphones in the second, or vice versa. The order of stereo configurations and listening mediums were chosen randomly from the eight possible combinations. xxi KULAKLIK VE HOPARLÖR DİNLEME DÜZENLERİNDE SEÇİLEN İKİ MİKROFON DİZİLİMİNİN TERCİHİNİN ÖZNEL ÖLÇÜMÜ ÜZERİNE BİR DENEY ÖZET Bu tezin temel amacı günümüzün ticari stereofonik ses kaydının üzerine kurulu olduğu genel bir teorik literatürünü ortaya sermek ve yeni müzik dinleme alışkanlıkları üzerine yeni yönler keşfetmektir. Stereo kaydın bir çok kayıt mühendisi ve müzisyenler tarafından sadece klasik müzik prodüksiyonuna yönelik olduğu düşünülür, fakat diğer türlerde de kullanımı mevcuttur. Stereo/surround mikrofon kurulumlarıyla akustik müzik prodüksiyonu yapmak oldukça fazla tecrübe gerektirmektedir ve de bu iş için ses mühendisinin uygun aygıtlar ile (yeterli mikrofonlar, titiz monitörleme ekipmanları ve uygun bir post-produksiyon odası) donanmış olması kadar müzik türü üzerine dinleme tecrübesi ve arka plan bilgisine de sahip olması gerekmektedir. Çoklu mikrofon kayıt yönteminin aksine, ki bu yöntemde her bir ses kaynağı/enstrüman akustik olarak birbirinden ayırılabilir ve farklı zamanlarda kayıt edilebilir, stereo kayıt bütün ses kaynaklarının akustik çıktılarının icra edildikleri ortamın akustik bilgisi ile karıştırılmasının sonucudur. Teknik olarak konuşur isek, bu tarz bir kayıt yöntemi yapımcılar ve dinleyiciler tarafından oluşturulmuş bir çok kıstasa uymayı gerektirir. Mühendis stereofonik panoramayı ayarlamak, yayılmış (difffuse) sesin direkt sese oranını ayarlamanın yanında mikrofonların ve sinyal akışındaki diğer ekipmanların da seçiminden sorumludur. Yukarıda bahsedilen kriterler arasında yapımcı için sınırsız olanaklar barınmaktadır ve ortada seçilen icracılar ve mekan için geçerli ve doğru tek bir yöntem yoktur. Bu yüzden, yapılan ses kaydının başarısı ve estetik olarak takdiri ses mühendisinin sorumluluğundadır. Stereo mikrofonlama kurulumları’nı (stereo microphone configuration) dört kategoride toplayabiliriz: coincident, near-coincident, baffled, ve spaced kurulumlar. Aralarından, test için near-coincident ve baffled kurulumları seçilmiştir. Eğer bir ses kaynağını iki adet mikrofon ile kaydetmek ister isek ve bu kayıdı iki adet hoparlörden geri çalar isek, bu iki sinyal arasında oluşan iki tür gecikme olmaktadır: a) mikrofonların arasındaki mesafe dolayısı ile sesin mikrofonlardan birisine diğerinden daha geç ulaşması sonucu oluşan kanallar-arası-gecikme, ve b) bu iki sinyalin iki adet hoparlörden geri çalınırken her iki kulağımıza farklı zaman gecikmeleri ile ulaşması sonucu oluşan kulaklar-arası-gecikme. Şöyle ki, sol hoparlörden yayılan ses sağ kulağımıza sol kulağımızdan daha sonra ulaşır; ya da tam tersi. Kulaklık ile çalımda ise, bu kulaklar-arası-gecikme durumu olmamaktadır. Testin sonuçlarının değerlendirilmesi sırasında gözlenen olgulardan birisi de bu farklılık ve bu farklılığın insanların öznel dinleme tercihleri üzerindeki etkileri olmuştur. Gözlemlenen bir başka olgu da, seçilen mikrofon kurulumundaki mikrofonların arasındaki mikrofonların birbirine mesafesi ve bunun insanların beğenisine etkisi xxii olmuştur. Seçilen iki kurulumun da çıkış noktası duyma sistemimizdeki temel belirleyici olan insan kafası olmuştur. DIN kurulumu için, mikrofonların arasındaki mesafe 20 santimetredir, ki bu mesafe ortalama 17-20 santimetre olan insan iki kulak arası mesafeye tekabül etmektedir. Böylelikle, DIN ile yapılan bir kayıt kulaklıklar aracılığıyla geri çalındığında, asgari düzeyde çalım yöntemi dolayısıyla eklenmiş kanallar-arası-gecikme olması beklenmektedir. Biz bu durumda kulaklarımızın bu sesleri akustik ortamlarında bulundukları şekilde kafamız orada bulunsaydı duyacağımız kulaklar-arası-gecikme ile duymaktayız. Bu durumda bir soru ortaya çıkmaktadır: sadece kulaklar-arası-gecikme ile yapılan bir kayıt bir insanın kayıt edilmiş ses hakkındaki öznel değerlendirmesini ne derecede etkilemektedir? Baffled kurulum durumunda, mikrofonlar arasında tekrardan insan anatomisi göze alınarak 17 santimetre bulunmaktadır. Fakat bu kurulumda ek olarak, iki mikrofon arasında insan kafasının iki kulağa ulaşan sesler arasında yarattığı frekans engellemesi sonucu oluşan farklılığı taklit etmesi amacıyla bir engelleyici daire konulmuştur. Bir başka soru ise, bu engel kulaklık ile dinleme durumunda insanların öznel takdirini nasıl etkilemektedir? Bu tez çalışmasında, mikrofon kurulumlarının ses kaynaklarına mesafeleri aynıdır. Kayıtlar için değişkenler mikrofonların yönel cevapları ve kurulum şekilleridir. Farklı dinleme ortamlarında ve kurulumlarında kullanılan kulaklıklar aynıdır, fakat dinleme odaları ve hoparlör türleri bir çok farklı dinleme ortamları göz önüne alındığında çeşitlilik göstermiştir. Kayıt durumları için, farklı boylar ve akustik düzenlemelerdeki üç farklı mekan seçilmiştir. Kayıt mekanlarındaki bu çeşitlendirmenin sebebi, farklı mekanlardaki sese karışan istenen/istenmeyen ses oranının (gürültü, yansımalar gibi) farklı mikrofon yönel cevapları ve kurulumlarıyla değişkenlik göstermesinin de değerlendirmeye alınması amaçlanmasıdır. Aynı zamanda kayıtlardaki icracı toplulukları da boyut, enstrüman karakteri ve tını özelliklerinde geniş çeşitlilik göstermektedir. Üç farklı odada yapılan dinleme testlerinde, denekler seçilen bir kaydın iki kurulum ile yapılan örneklerini arka arkaya dinlemişlerdir ve hangisini beğendiklerini belirtmeleri istenmiştir. Daha sonrasında, kullanılandan farklı dinleme yönteminde (örneğin ilkinde hoparlörde dinlediler ise, ikincisinde kulaklık ile dinletilmiştir) aynı test yapılmıştır ve yine hangi kaydı beğendikleri sorulmuştur. Kayıtların dizilimi ve önce hangi dinleme yönteminde testin yapılacağı tamamen rastgele bir şekilde mümkün olan sekiz farklı kombinasyon arasından seçilmiştir. 1 INTRODUCTION 1. There have been many studies on stereophony in the last century, among which we can count Blumlein (1931), Williams (1987), Griesinger (1987), Woszcyk (1990) and many others. Studies and listening tests on stereophony were done almost exclusively on loudspeaker reproduction cases and headphones reproduction is mostly looked down on. Starting point for this thesis was the importance of headphones for modern listening trends observed by myself on my own experiences and my observations on my circle of acquaintances and people I encountered on my commute. We will do no wrong if we take Blumlein’s patent on stereo (Blumlein, 1931) as the starting point for studies on stereophony. Blumlein’s starting point was the audio for visuals screen in talkies. From 1931 till the release of first Walkman in 1979, stereo can be considered mostly designed for loudspeaker reproduction. We can term that period between those dates (and to the modern times to some extent) as “the age of loudspeaker stereophony”. Recorded music was consumed in social gatherings, parties, in cars, etc. Headphones were seen as a device which is part of communication technologies or as an instrument used in experiments. Moreover, with the modern society’s lifestyle and new technologies, headphone music consumption has increased rapidly and long have been gone the stereo loudspeaker sets at homes. When I started thinking about my thesis subject as a recording engineer, I started from the point where I put myself in a position of a music records listener. I have been listening to recorded music selectively and consciously for a decade now and I have done it almost exclusively on headphones. I listen to music via my headphones plugged into my mobile phone or laptop 6-8 hours a day, using the online streaming service TIDAL. 2 Recorded sound has countless parameters, objective and subjective. Among the objective ones we can list: frequency distribution, loudness in RMS, sample rate. While in the subjective ones there are: boominess, boxiness, edginess, etc. Even though listening to the recorded music while being conscious about the sonic aspects of its parameters such as spectrum, dynamic range or technical aspects such as sample rate or bit depth is not possible for an untrained music listener, the impressions they leave on him/her is still remembered after a long time. Such is the case that, after several years, if I try to recall the recordings I used to listen to while I commuted to school when I was 16, I wouldn’t be able to make valid comments on their technical or sonic parameters such as their frequency distribution or reverb tail length as I can –more or less- do now as a professional sound engineer, but I recall the overall impression they left in my mind. When an album that I listened before pops up on the “discover” page of my streaming service randomly, I recall impulsively: “Guitars didn’t sound right in that recording” or “ That song had a weak low-end”. This made me aware of the fact that even though listeners may lack the technical knowledge about how a recording works, they still get an impression about the overall state of a recording, commenting some reflective adjectives such as “shiny”, “bassy” or “ouchy”, etc. The subjects chosen for this study were not only professional sound engineers or sound engineer candidate students, but they were also students from other departments of music and people from completely unrelated disciplines. By doing so, I am aiming at discovering people’s natural preferences and “tastes” in a music recording rather than their analytical approach to analyzing a recording by means of several technical aspects. Starting from my observation on my own circle of friends and colleagues and from a recent study by Lepa & Hoklas (2015), I chose headphone listening as the medium of test. Aim and Method 1.1 The method for the study is to conduct subjective listening tests on people by playing two different recordings - done with two different microphone arrays - of the same excerpt, and asking them their personal preference on the two options and its aim is to deduce conclusions on people’s subjective preference on headphone listening and compare the results to the listening results of speaker listening test as well as the 3 previous studies done in the field of stereophonic sound. Discussion, at the end of the study, includes interpretation of the various acoustic characteristics of the performing space in which recording took place, microphones’ polar patterns and their relation to rejection/inclusion of the extra-musical sounds (such as noises and room information). An Overview of Music Recording History 1.2 Discussing the history of music recording in a broad and extensive way would be out of the scope of this thesis although it would be helpful to mention specific milestones in its history to ponder over the situation of records listening today. The medium that we listen to records is highly significant for our sound output choice, namely headphones or speakers (most popular and common output options for the moment). In my opinion, there are five milestones in music recording & listening history: invention of phonograph (1877), invention of modern stereophony (1931), release of Walkman cassette player (1979), digital consumption of music via CD (1980s) and distribution of it via internet (1990s-2000s), and the age of streaming services (2010s). I would like to talk about them in more detail below. With the invention of phonograph, people heard of music recordings and they became aware of the possibility of capturing a sonic event in place and the ability of reproduction of that sound. Not only did it diversified music listening habits of people, but it also added new dimensions on the conception of music. This thesis and all the previous studies it is built on, start with what the invention of phonograph made possible: sound recording. His discontent with the monophonic and confusing sound – visual correlation in the cinema made Alan Blumlein come up with an idea: binaural sound. What he called binaural is conventional “stereophony” today and binaural audio has evolved to be a different concept. Blumlein proposed ideas and designed solutions ahead of his time and we still use many of his methods on stereophony. Following years witnessed spread of music recordings and it wasn’t before 1960s that stereo started to be used more widely in commercial recordings. Although what he proposed is taken for granted the way how stereo works today, it was a new idea at the time. He proposed that a recording should be done with two microphones and these two microphones 4 must be connected to two separate loudspeakers to create the stereophonic sound, which can be done by hard panning a channel to one side on modern equipment. If Blumlein’s invention was one arm of the tri-pod underlying the headphone listening habits in today’s world, next to it was the invention of Walkman. People now had a chance to take the music with them to wherever they want easier. It also enabled them to personalize their listening music activity excluding other people around them. As it’s output, Walkman had a stereo headphone jack and most people listened to it via headphones. With releases such as Macintosh computer of Apple Inc. in 1984, Commodore’s Amiga in 1985 and Atari Corporation’s Atari ST in the same year, the wave was towards digital technology. This change hit music technology as well, and Compact Disc was invented by Philips to store music recordings -which had already been converted to digital with satisfying results- onto a digital optical disc. A detailed history and description of analog to digital conversion of audio signal will not be discussed but a brief explanation should be made on the process. Analog to Digital Conversion (ADC) in sound, is the process of converting the recorded electrical imprint of sound to digits for storing and manipulating it further. It is realized by Analogue to Digital audio converters by dividing one second to specific intervals (sampling rate) and noting sound signal’s amplitude on every interval (bit depth) with several more processes such as anti-aliasing filters, dither, etc. The reason why it is so important for headphone listening is that it paved the way to streaming music which is the third leg of the tri-pod mentioned above. Transfer of recorded music onto digital domain enabled dematerialization of music. Acoustical vibrations in the air were caught via phonograph, cut onto a disc and converted to digital files consisting of 1’s and 0’s. As we convert our electrical signal of our recording to a digital file, what we have as a result is “data”. This data holds the information as to re-create the acoustical vibrations first stimulated by the sound source in the first place. Digital data requires no physical conditions under which it can work, unlike magnetic energy which was used to keep music recordings before digital. With the widespread of internet, now the data is on the air, literally. Any person with internet connection can connect to a streaming service and access the recorded music through their laptops or mobile devices. 5 The evolution of music recording has led to a “mobile listening trend” where people listen through their headphones connected to their mobile devices and with the study of Leppa and Hoklas (2015), we see that mobile listening ratio is in increase today. Stereo Recording versus Multi-Microphone Recording 1.3 Supposing we are down to two channels for reproduction of our recorded sound, we can make a simple classification between two different approaches to a recording: Stereo recording and Multi-microphone recording. The term “stereo recording” here is used for capturing a sonic event by a stereo recording medium, namely a recording that is captured by exclusively or in a major part a stereo microphone array. Multi-microphone recording is, on the other hand, capturing single events separately and forming a mixed performance out of the single recordings as a final result. These single events may be recorded in separate acoustical surroundings, and at different times. Whole subject of these two types of approaches to a music recording is beyond the scope of this thesis, but still it is necessary to elaborate on the two concepts a little bit more before going further. For convenience, the terms “stereo recording” and “multi-channel recording” will be used for the two categories. Multi-microphone recording shouldn’t be confused with a recording in which a stereo microphone’s signal is mainly used and is supported by spot microphones mixed in smaller amounts to support the individual sound sources in an ensemble. That technique would still be considered a stereo recording technique, only with more options. In stereo recording, we use two (or three) microphones to capture the sound image of an ensemble. More microphones can be used for adding further detail or space information but the main frame of the recording is fixed by the main pair of microphones. Positions of the sound sources in the acoustical field are projected on the microphone array by inter-channel intensity and time differences and are reproduced through speakers or headphones in a way to re-create the image with the help of an effect termed ‘phantom image’. Phantom image is an illusionary sound source that is perceived as located anywhere between two speakers. This effect is achieved through inter-channel differences and converted into inter-aural differences on human hearing system (Bartlett, 2016). However, in multi-microphone recording, the stereophonic panorama is created through placement of individually recorded 6 sounds in post-production. Those individual sounds may be recorded by a stereo microphone array, yet in the end there is not an overall stereo channel that is decisive on the sound in greater detail. These individual recorded sounds (or ‘channels’ as they are called in the profession of sound recording) are then mixed and placed in specific locations in the stereophonic panorama. That aural placement is done by ‘panning’. Panning creates inter-channel intensity differences between channels. In other words, when we pan a sound on a mixing desk or on a digital audio workstation, we increase that sound’s volume on the channel that we panned it to and we decrease the intensity of it on the opposite channel by the same amount. In stereo recording, sources are recorded from a distance for the microphone array to cover all the sources. That distance from the microphones to the sound sources is one of the factors that is decisive on the sense of space that is presented in the recording. Some others are, acoustics and volume of the space that recording is taking place in and polar patterns of microphones. No matter how far from the ensemble the microphones are placed, the sense of space is consistent for all the sources. That is, we hear all the individual sound sources in the same room (except for sources that are physically placed farther intentionally for a different sonic signature). There may be spot microphones that are mixed in the sound but they are still kept consistent within the overall aural perception. There is uniformity and we can call that uniformity of sense of space “uni-dimensionality”. For multi-microphone recordings though, creating a realistic and uniform sense of space is not a concern mostly. Instead the space is curated “multi-dimensionally” for the sake of unrealistic and creative ambience. When we listen to a commercial pop-genre song, it is completely usual and conventional to hear the vocal, guitars, or drums in completely different spaces. These variations on different spaces are done via the use of artificial reverberation algorithms, as it is impossible to create that “multi-dimensionality” in real acoustic spaces. We can call the uni-dimensional space on stereophonic recordings as “captured space” while the multi-dimensional space in the multi-microphone recordings may be called “curated space”. “The concert hall or recording studio should be thought of as one of the instruments in classical music recording, as the most common techniques capture the sound of both the instruments and the room as one audio source (King, 2017)”. 7 Although the terms mentioned above are “conventions” for the types of productions, they are not fixed rules and each technique of curating or capturing sense of space can be used for any type of recording. The main arbiters on the choice are expectations, listening habits, and tastes of the musician, audience and sound engineer/producer. As human beings follow no rules or mathematical calculations on their choices over esthetical decisions, they may go for un-realistic reverberations for their sound in live performances. A common example is that many singers and audience would like to hear singing voices in a reverberant (a sense of reverberation that is stimulated in usually a larger room than the one that performance is taking place in) space. Concepts of pan-pot recording is so much ingrained in the sound of some music genres that they search for creating this effect with mixers and effects processors in live performances – so it is normal and expected to hear a reverberation of a singer’s voice in a big space such as a cathedral and hear it from the PA speakers in a much smaller space such as a bar than that reverberant space. An analogy can be made between two music-recording approaches and the differences on the theatre/cinema film production approaches. In theatre, every action is created on the stage and viewed by the audience from one point of view (uni-dimensionality stereophonic recording) and the story is supported by audio-visual effects with the help of post-production technics such as PA speakers and lighting systems while in the cinema movie in which every scene is shot on various locations and then combined into a uniform story (multi-dimensionality of multi-microphone recording) people are experiencing the drama from many different audio-visual worlds. 8 9 THEORY 2. This section will outline the theoretical background on which this thesis is based. A Brief Overview of Human Sound Localization System 2.1 Sound localization is the process of a living being’s receiving aural information and decoding them in a complex way to deduce information on the location of the source of that sound. Aural perception is crucial for any living creature as it creates a perception around him/her with even a greater degree than visual perception. Humans have two ears and because of these two ears’ spacing and their opposite angles we localize sounds by two main differences: time and intensity. We localize sounds coming from our surroundings by: Inter-aural Time Difference, Inter-aural Intensity Difference and frequency content (Bartlett, p.192). Difference of frequency content between two ears is defined by our head’s baffling effect above a certain frequency, approximately between 800-1000Hz, also called Head Related Transfer Function (HRTF). Most humans have their ears spaced approximately 17-21cms. Thus, there is a time delay between the two ears up to 630µsecs. 630µsecs is the period of a half-wave length of 800Hz. Therefore; up to 800 Hz we hear sounds with phase delay between each of our ears (Lipshitz, 1986). For frequencies higher than 1600Hz (the numbers differ with every study), level differences between each ear are determinant on the localization as the wavelengths are shorter than the ears’ spacing. Baffling effect of the head increases as the frequency increases. It was proposed by Blumlein (1931) that up to approximately 700Hz, the localization is done by phase differences between sounds reaching two ears. Above 700Hz, the difference is made by intensity differences that happen because of head’s baffle effect, blocking the frequencies above 700Hz. There are also localization formulas and theories for the other dimensions, which are vertical, and depth (distance), but for our study we are focused on the horizontal localization. 10 Reflections are also important for sound localization as much as they are important for subjective evaluation of sound (choice of reverberation for specific kinds of music). According to Bartlett (1979), the direct sound and reflections within about 2 milliseconds contribute to localization. Reflections that are delayed more than 2 milliseconds are also important on other factors such as sense of spaciousness and perceived tonal balance. Outer ears (pinnae) are also important on determining the localization of sound sources as they apply a mechanical filtering on the sounds according to their positions, e.g. sounds coming from back of the head are filtered in high frequencies by the pinnae. An Overview of Stereophonic Recording Techniques 2.2 Even though the first noted attempt is known to be the French “Théâtrophone” system developed by Clément Ader in 1881 in Paris as a music transmitting service for subscribers, “Stereophonic” sound in a modern meaning came to be recognized after Alan Dower Blumlein’s milestone patent, named: “Improvements in and relating to Sound-transmission, Sound-recording and Sound-reproducing Systems (British patent # 394,325)”. In that patent paper, he noted down observations on humans’ anatomy of hearing and its relation to recording techniques. He also proposed a new and applicable method on capturing and reproducing sound in stereo. There were also experiments conducted on stereophony contemporary to Blumlein’s studies, across the Atlantic Ocean at Bell Labs. Dr. Harvey C. Fletcher was the leading figure in the experiments done at Bell Labs. He pioneered studies on psycho-acoustic phenomena. With Wilden Munson, they created the famous “Fletcher-Munson” curves that are the results of studies on human beings’ non-linear frequency dependent hearing system (Streicher & Everest, 1998). Fletcher and his colleagues focused on a different point of view for stereophonic recording and capturing than Blumlein’s. Their idea to realize a realistic sound recording and reproduction was by placing an infinite number of microphones in front of the sound source and reproducing all the microphones’ signals with the same amount of loudspeakers positioned at the same positions as the microphones. As that was not possible because of the economical and physical limitations, Fletcher and his colleagues reduced the number of microphones to two or three which were placed on the ground level. The researchers at Bell Labs, focused on spaced microphone pick- 11 up and reproduction with two or three microphones while Blumlein focused on coincident-pair recordings and these two approaches formed the basis for conventional stereophonic recording techniques in the next 50 years or more (Streicher & Everest, 1998). In 1966, H. Mertens carried the research on stereophonic sound further by publishing information on the relation of inter-channel intensity (IID) and inter-channel time differences (ITD) with apparent reproduction positions in the 30° listening configuration. The values presented in the study showed the minimum amount of IID or ITD required for a sound to be perceived as coming from one loudspeaker exclusively. G. Simonsen published new data on those findings and he used natural sound sources such as voice and maracas and he included other listening configurations such as 10° and 20° (Williams, 1987). Williams (1987) built his theory for Stereophonic Recording Angles (SRA) on these two previous studies and published the interpolated graphs of the data provided by Simonsen and Mertens attaching his own findings. According to SRA theory, for every microphone pair configuration, lateral limits of recording area could be calculated by using the data on the required ITD and IID given by the previous studies. The SRA is dependent on the angling, spacing and polar patterns of microphones. Perceived location of a sound source in stereophonic reproduction is called “phantom image”. A sound source that is placed in the frontal 180° of the microphone pair is recorded by the pair and the IID and ITD captured in the two channels is then transferred to perceived phantom image location in the reproduction. Wittek and Thiele (2002) published a new study built upon the previous studies on the SRA. They found out that phantom image localizations up to 75% didn’t differ much in their data and previous studies’ data (%100 localization meaning a hard-panned sound source). They concluded the derivations after %75 localization are caused by the various sound sources and other factors in the recording and they came up with the term “Recording Angle_75%” (Wittek & Thiele, 2002). Wittek and Thiele found that every 1dB IID in one channel resulted in a phantom source shift of %7,5 and that every 0,1ms ITD resulted in a phantom source shift of %13. They also stated that the two phantom source shift factors were cumulative. 12 ITD versus IID 2.3 For every combination of angling of microphones, distance between microphone diaphragms and microphone polar patterns, SRA or Recording Angle_75% can be calculated by engineering formulae (Kleczkowski, 2011). This means that two different stereo microphone arrays can have the same recording angle while one is based on ITD (an AB pair) and the other is based on exclusively IID, e.g., Blumlein pair. However, factors such as texture, localization of sound sources, room information and mono-compatibility differs considerably (Williams, 1987). 2.3.1 Localization versus spaciousness Coincident pair microphones are formed by placing two microphones capsules as close as possible to each other, thus enabling a stereo recording with only IID. This technique is considered good for capturing localization cues of sound sources without a phase delay in any frequencies as sounds arriving from any direction arrive both microphone diaphragms at the same time. The localization information is obtained by intensity differences between two channels. Intensity differences between the channels are highly dependent on the polar patterns of microphones and angling between them. Although there is no phase delay between the channels, not all frequencies are localized at the same place they are positioned in the stereophonic panorama. Frequencies in the lower region are not localized in any of the channels. The amount of IID between the channels increases with the frequency. A phase delay that is the result of spacing between microphone capsules causes an inaccuracy in the localization of the pair and distorts the stereophonic panorama (Streicher & Everest, 1998). This effect is considered the fault of spaced pairs for some researchers i.e., Lipshitz (1986) and Griesinger (1987). Despite being favored in the subjective listening tests over coincident stereo pairs, the pleasantness in the spaced pairs is suggested by Lipshitz (1986) to be the illusion created as a result of distorted phase relationship between the channels. The result of this stereophonic distortion is an effect called “spaciousness”. Spaciousness is the amount of correlation of the reflected sound, which is the L-R to L+R ratio (Griesinger, 1987). Coincident pairs are known to produce a recording with sharper localization while spaced pairs produce a more spacious sound. There are other techniques in between spaced pair and coincident pairs combining both IID and ITD. These are called near- 13 coincident pairs and they are created by increasing the spacing between the microphones of a coincident pair. 2.3.2 Four types of stereo microphone configurations As it is mentioned in the previous section, there are four main categories for stereo recording microphone configurations (Streicher&Everest, 1998). These are: Coincident pairs, near-coincident pairs, spaced pairs and baffled pairs. Coincident pairs are formed by placing two microphones capsules as close as to each other, thus creating only IID between channels and omitting any time or phase differences between the two channels. They are usually known to supply a sharp localization, often too analytic (Bartlett, 2014). Near-coincident pairs are the modification of coincident pairs. They are formed by spacing the microphones of coincident pairs. There are many different near-coincident pairs, ORTF, RAI, NOS, DIN or “Stereo 180 System” are some of them. The spacing of microphones may be said to supply a compromise between IID and ITD between channels. Recordings made with near-coincident pairs sound more spacious than the ones made by coincident pairs (Bartlett, 2014). Spaced pairs are formed by placing two microphones with a distance of 30cm to 3meters or more. If the polar patterns of the microphones are Omni-directional, angling the microphones doesn’t have any effect on the frequency response, theoretically. 40-60cm suggested by the DPA microphones to be ideal for the spaced microphones. Placing an object between the microphones, thus simulating head of a human, forms baffled pairs. The baffle can be a physical re-modeling of a human head from absorbent material or a disk. Jecklin (1981) proposed a disk with 28cm diameter between the microphones. A similar disk was constructed for the sake of recording tests in this study. An example of Jecklin Disc can be seen on Figure 2.1. 14 Jecklin Disc (Josephson Engineering) Figure 2.1 :2.3.3 Recording angles and angular distortion As it was mentioned above, each stereo pair configuration has its own Stereophonic Recording Angle (SRA). When we listen to sounds, we hear them surrounding us and we are open to sounds coming from any direction. Microphones, on the other hand, pick up sounds from different angles with varying degrees depending on their construction. With the stereophonic recording we can transfer an acoustic event from one place to another through microphones, recording medium and reproduction systems. The aim of stereophonic recording is linear stereo image transformation as much as possible. Since it depends on the sound source’s width, microphone configuration and polar patterns of microphones, angular distortion occurs. Angular distortion is the discrepancy between reproduction of phantom images of sound sources and their actual positions in the acoustical domain while they were being recorded (Williams, 1987). Williams also notes angular compression/expansion and he points to their difference from angular distortion. In optimal speaker listening configuration each speaker is positioned at the 30° and since most of the sound sources in the recording area occupy a larger area than frontal 60°, angular compression/expansion occurs inevitably. What happens about angular expansion/compression on headphone listening is a question. Since headphones are “located” at the 90° of the listener, is it always a case of angular expansion? The subject is to be investigated further in the next section. 15 Listening through Headphones versus Speakers 2.4 Throughout the history of stereophonic recording and reproduction studies, the listening medium was in most cases, speakers. Headphone was merely a device used for applications in the engineering, aviation; and in music it was used mostly in the recording process to check noises or hearing the other channels while recording in multi-track. Most studies done on the headphones were funded and encouraged by headphone manufacturers. Studies on headphones have started to gain popularity with the virtual reality applications and binaural sound, but it was still not the main object of the most listening tests. Even today, based on personal observations, most professional sound engineers listen through headphones mostly while recording or in the post-production stage as an alternative source than speakers, and they don’t let any work out before checking the recordings on speakers as they consider speakers as the most prominent listening medium. People’s listening trends in the modern age are beyond the scope of this study and the focus on headphone listening for the study stemmed from only the writer’s personal listening trends and his observations. There are a number of studies on comparison of headphone and speaker listening configurations by writers such as: Plenge (1972), Blauert (1999). It is a known fact that with headphones, we image sound sources differently than speakers. In speaker reproduction, we localize sources with IID and ITD, and there forms an illusion of continuous panorama between the speakers – and even beyond, in some cases – and what we hear are imaginary aural positions some place between the speakers. One of the reasons for this fused stereophonic panorama in speaker listening configurations is the acoustic crosstalk. When we hear sounds from one speaker, let’s say from the Left speaker of the stereo system, we hear it with not only our left ear, but with also right ear; with a delay and spatial difference because of the distance between our ears and our head’s baffling effect. The effect, which is the result of head’s baffling function, is called Head Related Transfer Function (HRTF). With the help of this effect, speaker reproduction creates reduced separation between the channels (Griesinger, 1987). Now lets have a look at stereophonic recording with two microphones that are spaced at some degree. Even though the microphones are spaced in small amounts, such as 17cm, there will be a phase delay between the channels in the recorded signal. When we reproduce these two channels through a stereo system with two speakers, we add even further phase delay to each channel’s 16 perception between the two ears. According to Griesinger (1985), this is the main drawback of two-loudspeakers reproduction, namely, “the perception of loudspeakers as sound sources”. The subject of acoustic crosstalk is also touched upon by Lipshitz (1986), as he pointed out to differences between “Natural hearing” which is the hearing of acoustic sound sources in a live environment versus “Stereo Hearing” which is the result of hearing sound through two-loudspeakers reproduction. There are many discussions on Hi-Fi forums on the Internet about the preference of listening through headphones or speakers. Some people argue that speakers provide better localization -or “imaging” as the term is generally referred- and soundstage (Guttenberg, 2016). According to some Hi-Fi enthusiasts, soundstage is when you close your ears in front of two speakers and there forms an illusion that you are listening to a stage created by the producer/record engineer instead of you are listening to sounds that are emanating from two points where your speakers are placed, while imaging is referred to the speakers’ ability to reproduce the sound sources as faithful as possible to the intended placement in the production. Although these terms and their definitions may differ from professional sound engineers’ perspective of the matter, they still reflect a number of audio enthusiasts’ opinion on the matter, which is the difference between headphone and speaker listening experiences. Plenge (1972) makes a difference between loudspeaker and headphone listening situations with an analytic and experimental approach in pointing to differences between localization functions of the two systems. According to Plenge (1972), when listening to sounds coming through earphones, the sound source, or sources, seem to be located almost without exception, inside the subject’s head. He called this function “Inside-Head-Localization (IHL)” and made a difference between IHL and “Outside-Head-Localization (OHL)”, which is the result of localization of sources emitted from two loudspeakers with the additional effects of: a) acoustical characteristics of the listening room, b) diffraction by the head and its change through head movements (Plenge, 1972). He also used the terms “lateralization” – for headphone reproduction, and “localization” – for speaker reproduction. 17 2.4.1 Practical/pragmatic considerations The choice on the reproduction means may not always be in discretion of the listener or sound engineer. A common reason for choosing headphones in the post-production stage for engineers; for instance, is the lack of speakers, the lack of economical funds to obtain them, or lack of an acoustically suitable room for critical listening. Reflections in the room are prone to blur the stereophonic panorama even further in speaker reproduction; and according to the room dimensions and the placement of speakers in the room, it is highly possible that there may occur boosts and cuts in some frequencies as a result of acoustical reflective response of the listening room. As for localization, the ideal speaker listening configuration is where the listener and the two speakers form an equilateral triangle. In that case the listener sits in the middle and the two speakers are placed at 30° each to the listener, their tweeters are aimed at the listener’s ears. It must be noted that 10°, 20°, 30°, listening configurations bore various results in the listening tests (Williams, 1987). All of the configurations above, though, allows for one “sweet spot” for the listener; and the frequency content, as well as the inter-aural phase delay that occurs as a result of two ears’ spacing, changes with the listener’s head movements. The stereophonic panorama, or the “illusionary sound stage”, created by the two speakers in front of the listener, is lost when the listener steps out of the sweet spot. In headphone reproduction, however, there is stability in the listening configuration and panorama, which is independent of any movement or placement. Another common observation noted by the listeners is the difference in the amount of reverberance in a recording, between headphone and speaker listening situations. When we listen through headphones, we hear the reverb in the recording in isolation from the room that we are in; while in speaker reproduction, we hear it, mixed with the room’s reverberation. Apart from the discussion of the situation’s aesthetics, we can accept that it is a parameter outside the information embedded in the recording. Also, Lipshitz (1986) questions the naturalness of hearing reverberation from only front, which is the case in loudspeaker reproduction of a stereo sound recording when compared to acoustical hearing. What is reduced with headphone listening when compared to speaker listening is also the noise. Whether they have active noise cancelling technology or not, headphones are able to isolate the listening experience from the surrounding noise. Thus they 18 enable listener to have a greater detail in the listening experience, allowing a more critical listening occasion. Many people, on the other hand, consider listening through headphones, unnatural. One reason for that is the lack of bone vibrations, which is the result of acoustic emanation of signals from loudspeakers. There are a number of studies on the effect of bone-conducted sound compared to air-conducted sound (Plenge, 1972). Another fact that is lowering the efficiency of headphones for critical listening is the closed-back design of most circumaural (or around the ear) headphones. When we listen through headphones, the sound moves in a relatively small space that is between the diaphragm of the headphones and our ear tunnel. This closed space lead to reflections inside the ear-cups, which can blur the overall spatial characteristics of the recording. Open-back design headphones have a way to hamper this problem. As their ear cups are designed with perforated plates, thus allowing air freely pass in and out. This leads to a lesser amount of noise reduction –both in and out to the ear cups – but it leads to a more natural sound (Goodner, 2018). 19 RECORDING AND LISTENING TESTS 3. In this section, a detailed documentation of recording setups, test listening rooms and survey method are outlined. Three Different Listening Setups and Locations 3.1 It is well known that the space in which a music performance takes place and is recorded is highly important for the sound of a recording. We are accustomed to hearing different genres of music in various acoustic spaces and our brain links that acoustic information to those types of music very well. For instance, we are used to hearing classical music performances in reverberant concert halls and in the music recordings of that genre, a ratio of reverberation to direct sound is sought after by recording engineers. Room acoustics is beyond the scope of this study, but it is still necessary to touch upon a few common facts before going any further. Each performance space has its own characteristics. While we may expect a good reverberation from a performance hall designed mainly for classical music, one may not expect to find an acoustically treated performance space in an underground music bar. Yet, even acoustically untreated -and sometimes noisy- places have their own acoustical imprints that are well encoded in our brains and some types of music performance calls for that kind of an acoustic information. A great amount of stereo location recordings are done in music halls, rehearsal rooms, conference halls, music bars, etc. Thus, the acoustical imprint of the performance space on the recordings is inevitable and desired in most recordings. Yet if the performed music is not associated to the performance space, such as performance of a string quartet in a conference room, the acoustical imprints might be irrelevant to the genre and be perceived as displeasing by the listener. For such cases, the recording engineer has options in the post-production stages, such as adding artificial reverberation, yet it is important that as little early-reflection information as possible is leaked into the recording to avoid multi-dimensionality in the reverberation, i.e. hearing the musical instruments in both a conference room and 20 in the selected artificial reverberant space. For such cases, microphones with more selective polar patterns may be preferred. In the recordings for this study, the selected two pairs were chosen with those facts in mind. Since the selected performance spaces were various in their size, acoustical treatment and purpose, one pair was selected with a limited scope of sound reception (sub-cardioid polar pattern) and another pair with none limitation (omni-directional polar pattern). Analysis and evaluation of room information leaked into the recordings is not the aim of the study, but it is assumed to have effects on the subjective evaluation of the recordings. The selected spaces were: an acoustic performance hall, an underground music café/bar and an acoustically treated rehearsal room. For the experiment, four arrays were used, chosen from the four categories mentioned before in this study: coincident, near-coincident, spaced and baffled pairs. Four systems for each of the categories were respectively: XY90, DIN, AB40 and Jecklin Disk. However, for the sake of being more convenient and easier operation of listening tests, only the near-coincident and baffled pairs were selected as listening test options. The microphone systems were positioned in such a way that their capsules were at the same distance to the sound sources. It can be seen in Fig # DIN system is constructed by placing two cardioid microphones 20cm apart with a 90 degrees angling. In the test recordings, wide-cardioid small diaphragm microphones were used. The two microphones for this system was Line Audio CM3. A baffling disk was constructed for the recording, modeled after Jürg Jecklin (1981). The core of the disk was a round piece of chipboard with 30cm diameter, instead of 28cm proposed by Jürg Jecklin, with 1cm thickness. Onto both surfaces of the disk, highly absorbent acoustic foam with 2cm of thickness was glued, thus making the disk’s thickness 5cm in total. Each microphone was, then, placed 6cm from the surface of the disk, thus enabling a spacing of 17cm between the microphones. The microphones used for the baffled pair were OM1s, also from Line Audio. Frequency responses of the microphones for DIN and Baffled pair were very close since they are from the same manufacturer and are designed to be as flat as possible. Frequency responses can be seen at Figure 3.1 and Figure 3.2 below. 21 Figure 3.1 : Frequency response graph of OM1. Figure 3.2 : Frequency response graph of CM3. The recording spaces were chosen with the observation of location recording instances in İstanbul. It was observed by the researcher that most of the location recordings in closed spaces were done in concert halls, music clubs and rehearsal rooms, in the city. As for the concert hall, the selected space was Albert Long Hall (ALH) conference hall at Boğaziçi University South campus in İstanbul. ALH was constructed in 1862. It is a multi-purpose, medium-sized hall with 480-seats. The hall was covered with mostly parquet and wood and is said to have very good reverberation characteristics by the sound engineers who had done recordings on location. The recording was done during the rehearsal of the ensemble. Performing ensemble for the recording was Semplice Quartet and they rehearsed Johannes Brahms’s String Quartet in A minor, No.2, Op.51 and Robert Schumann’s String Quartet in A minor, No.1, Op. 41. The four arrays were placed 150-250cms to the musicians and their height from the ground was: DIN-160cm, XY90-152cm, Baffled pair-139cm, AB40-168cm. All the microphones were connected with identical cables to the Focusrite Scarlett 18i20 USB interface. The microphones for the XY90 and 22 AB40 pair were respectively: sE Electronics sE4 stereo pair on matched cardioid capsules and DPA 4061 matched pair. The tracks were recorded to Pro Tools 10 HD software running on Mac Book Pro computer. Photos of the recording space can be seen at Figure 3.3. Figure 3.3 : Semplice Quartet rehearsing at Albert Long Hall. Second recording space was İlhan Usmanbaş Hall in İstanbul Technical University Maçka campus. It is a concert space converted from a classroom-sized room with 50 seats. It is a small rehearsal room with 288 m3 volume. The length, width and the height of the room are respectively: 11,5 meters, 8,4 meters and 2,9 meters. The room is designed and it was acoustically treated for performances in a longitudinal order. The wall facing the audience is covered with diffusers and other wall, facing the performers, is coated with absorbent materials. Performing ensemble was Rezonans choir from İstanbul. The recording was done while the choir was rehearsing. The choir was seated in transversal position to the length of the room; thus, in an audience perspective of a stereophonic panorama, left side of the choir was in the diffuser part while the right side was in the absorbent side of the room. The differences of acoustical treatment at the two sides were not taken into consideration for the listening tests. The room was not coupled or isolated from outside noises and there were occasionally opened windows at the back of the 23 microphones, so there was a significant amount of consistent and occasional noise, such as buzz from electric lamp and passing ambulance from the street nearby. The choir was seated in a crescent shaped order and the microphones were placed approximately 3 meters away from the middle of the choir, just behind the conductor. The microphones were positioned roughly in the centerline to the performers. The heights of the pairs, XY90, DIN, baffled pair and AB40 were respectively: 163cm, 163cm, 145cm and 173cm. The microphones, cables and interface were identical as the Albert Long Hall recording. Third performance space was GitarLive in İstanbul. It is a café with a stage for musical performances. The floor is wooden and the back of the stage is covered with thick velvet curtains. The floor of the stage is also covered with wall-to-wall carpet. The stage is placed at the longitudinal end of the room. The length, width and height of the room are respectively: 8,7 meters, 3,9 meters and 3,25 meters. There is a removed wall frame in the room, which is a remainder of a door separating the two-rooms that previously constituted the space. The doorsill is enclosed by the masonry wall. The doorframe and the wall around it are working as a partial acoustical separation between the two parts of the room. The performance space can be seen on Figure 3.4. It is a common performance space for small ensembles and has hosted over two thousand performances in 15 years. It has no isolation from the outside noises and is surrounded by a noisy neighborhood. Behind the stage, there is a hole in the wall facing the air-conditioning fans of the building and the amount of noise in the room can be remarkably prominent when the air-condition system is working. During the recording, the air-conditioning system was off, though. Also there was a refrigerator noise from the kitchen next to the stage. The recording was done during a classical guitar duo’s rehearsal. Microphones were placed approximately 80cm to the musicians and their height was lower than the previous recordings as the performers were sitting on chairs. Microphones, their position to each other and the recording interface were identical to the previous recordings. 24 Figure 3.4 : GitarLive performance space. In the post-production stage, no dynamic or spatial processors were applied. Thus any use of equalizers or compressors were avoided since they bring phase shifts or dynamic range reduction to the recording. The only process done in the post-production stage for the recordings were to edit appropriate excerpts of 15-20 seconds, and match the LUFS levels of the excerpts. The excerpts were extracted in 44.1kHz sampling rate and 24-bit resolution. The recordings were also done on the same rates. Three Different Listening Spaces 3.2 Many recordings are produced as monitored by acoustically treated rooms; and precise, high quality speakers, yet the number of semi-professional and amateur productions are many in the music-recording world. With the advances in new technologies in the recording equipment field and easy sharing of information through internet, recording sound is now in the reach of any person who has the main components of a sound recording process, which are a microphone to capture the sounds, a medium to record the sound onto (a computer, in most cases) and a converter to convert the electrical signal from the microphone to the format of recording medium (generally called a recording interface). 25 Just like sound engineers and record producers produce in different places with various acoustic characteristics, technical equipment, monitoring equipment; many music listeners who listen to music via speakers also consume music in different setups. For example, one listener might listen to music via his/her stereo Bluetooth speaker outdoors while another listener can listen to his hi-fi stereo speaker set in his room with absorbent elements such as books, cushioned furniture such as pillows, sofas, etc. As is the general approach of this thesis, the places in which listening tests were to be conducted were chosen with the knowledge of people’s various listening configurations. Listening spaces were: Control room of MIAM (Centre for Advanced Studies in Music) studio at Maçka campus of İstanbul Technical University, control room of Music Technologies department of İstanbul Technical University Conservatory at Maçka campus and DigiLab at MIAM. 3.2.2 Listening spaces MIAM studio was constructed in 2000 by professional architects and acoustic engineers and the control room is well-designed and calibrated for flat response monitoring and the studio is equipped with very high-quality converters and speakers. The room has 34m3 volume and can be seen on Figure 3.5. The audio samples were played from Pro Tools software through Apogee Symphony ADC/DAC and through the ATC speakers model SCM200ASL. ATC speakers have 75mm size mid drivers and 34mm tweeter drivers and they are placed at approximately two meters from the listener as they are far-field monitors. Listener was seated in the “sweet spot”, i.e. the optimal listening place in the room that is equidistant to each speaker. Subjects listened to the recordings through Sennheiser HD600 headphones as well as speakers. HD600 headphones are open-back headphones. The listening tests were conducted on 10 subjects in the control room. Subjects’ occupations are as follows: 6 Sound Engineering and Design M.A. students, 3 Ethnomusicology M.A. students, and one Vocal Performance PhD student. The room is well isolated from outside noises. 26 Figure 3.5 : MİAM studio control room. Studio of Music Technologies Dept. of İTU State Conservatory has a small-sized control room, equipped with Aurora Lynx DAC/ADC converter and Digidesign RM1 monitors. The room is small-sized, with acoustical treatment. The walls are covered with absorbent material. Audio samples were played through the equipment stated above, and through Mackie Big Knob Studio Monitor Control. Headphones, used for the listening was again Sennheiser HD600 and the speakers were Digidesign model RM1s, as explained above. RM1 speakers are near-field monitors, and the subjects were seated approximately 1 meter away from the speakers while listening. 27 people participated in the listening tests in that room and their departments and occupations are as follows: 19 under-graduate students of Sound Engineering and Design department, 5 under-graduate students of Musical Instrument Making department, and 3 teachers in Sound Engineering and Design department. DigiLab practice room is a workroom with partial acoustical treatment. It is not designed for audio recording or post-production monitoring and is not suitable for critical listening in terms of room dimensions and geometry of the room. The room ceiling is mono-pitch and its height decreases as it goes from the door to the front 27 wall. There are two rectangular domes in the ceiling, with windows at the top for ventilation and sunlight. Those windows allow outer noises as well as the sunlight into the room even when they are closed. There are partial absorber panels on the walls of the listening room and behind the speakers there are two big bass absorber panels. The room can be seen on Figure 3.6. Focusrite Scarlett 18i20 USB interface was used for the playback and the samples were played through ATC SCM20A Pro speakers. Listeners were placed at approximately 1meter distance from the speakers, as they are near-field monitors. 13 people participated in the listening tests and their departments and occupations are as follows: 5 M.A. students of Sound Engineering and Design department, 2 M.A. students of Composition, 1 M.A. student of Sonic Arts, 2 M.A. students of Performance Arts, 2 M.A. students of Music Business and Management and one Librarian. Listening rooms’ features can be seen on Table 3.1. Figure 3.6 : DigiLab 28 Table 3.1 : Listening rooms. Listening Room Acoustical treatment of the Room Types/models of Speakers Size of the Room Number of Subjects Isolation of the Room from the outside noises MIAM Professional Far-field/ATC SCM200SL Pro Large 10 Very good Music Technologies Studio Semi-professional Near-field/Digidesign DM1 Medium 27 Medium DigiLab Partly-treated Near-field/ATC SCM20a Pro Medium 13 Weak 3.2.2 Listening setups and preparation of samples Short samples of 15-17seconds were edited out from the three different recordings to be used in the listening test. The excerpts, selected for the listening tests, were chosen from passages of the recordings with the same overall loudness. Further increase of volume was applied to match the LUFS levels of the excerpts but not any dynamic compression/expansion was applied. The levels of the audio samples can be seen on Table 3.2. Human sound perception and appreciation is biased on many levels and different sound levels may blur a person’s subjective preference over different audio samples; for that reason, loudness levels of the excerpts were aimed to be around the same. Also the samples were carefully selected from the performances with similar level of performed dynamics, i.e. they were similarly loud passages. 29 Table 3.2 : LUFS and peak levels of recordings. Audio Sample Recording Space Duration LUFS levels (Baffled pair/Near-coincident Pair) Peak (Baffled pair/Near-coincident Pair) Sample 1 Albert Long Hall 15seconds -19.7 dBFS/-19.5 dBFS -5.4 dBFS/-5.6dBFS Sample 2 İlhan Usmanbaş Concert Room 15seconds -19.9 dBFS/-19.7 dBFS -4.7 dBFS/-7 dBFS Sample 3 GitarLive 17seconds -19.8 dBFS/-19.7dBFS -6.2 dBFS/-5.9 dBFS For the listening test, it was decided that each listener listen to a selected sample four times: Baffled pair with headphones, baffled pair with speakers, near-coincident pair with headphones and near-coincident pair with speakers. To achieve the randomness of playback of samples, the combination of these four different listening orders were calculated. Out of these four samples, there can be formed 24 different combinations for playback, yet only the 8 of them are qualified for placing two different recording pairs in the same playback medium, i.e. headphones or speakers. The 8 listening orders can be seen on Appendix B. Each subject was asked to listen to the recording twice on the selected listening medium, i.e. headphones or speakers; then state his/her subjective preference over the two recordings and then the same procedure was applied on the other listening medium. Audio samples were named respectively: A, B, C. So each subject listened to different combinations each time. Combinations could be formed out of three different audio samples (A, B, C) and eight different listening orders (1, 2, …, 8). For example, if a subject is listening to B2, he would be listening to the recording, done in the Usmanbaş concert room with playback order of DIN-speaker, baffled pair-speaker, baffled pair-headphones and DIN-headphones. 30 31 RESULTS AND DISCUSSION 4. We can say that the listening environment, quality of listening equipment and people’s relation to sound recording have a great impact on their perception of recorded sound. For our listening tests, the three different listening rooms varied in their size, amount of noise and listening equipment. For headphone listening setups, though, the headphones were identical through all the tests. That way we are able to discuss – to an extent that the number of subjects involved in the tests allow – effects of the varying factors mentioned above for the speaker listening while we have a stable condition for every headphone listening configuration. In this section, the results of the three cases are going to be presented and discussed. The number of the attendees and their occupations can be seen on Table 4.1. Table 4.1 : Number of subjects for the listening test Occupation of Subject Number of subjects Sound engineering under-graduate and graduate student 34 Graduate and under-graduate students of other fields of music 15 People who are not related to music 1 TOTAL 50 Listening Nr. 1 – Music Technologies Dept. Studio at İTU State Conservatory The first listening tests were conducted in the listening room of the studio at İstanbul Technical University State Conservatory. It is a middle sized listening room with approximately 36m3 volume (see Figure 4.1). The tests were played back from computer running Pro Tools 10 HD software, through the interface Aurora Lynx and Mackie Big Knob monitor controller and Digidesign RM1 speakers and Sennheiser HD600 headphones. 27 subjects out of 50 participated in the test in the room. The table showing the preferences of the subjects can be seen on Table 4.3. The 32 abbreviations used in the analysis of the listening tests can be seen on Table 4.2. Those abbreviations are used only for the research and were not revealed to the subjects. Table 4.2 : Abbreviations used for the analysis of the listening tests Relation categories (Relation of the subject to sound recording): St.: Strong (a person who is involved in recording professionally) M: Medium (a person who is partially involved in recording, such as a musician with recording experience) W: Weak (a person who is not-related to music recording or music) Recording spaces: ALH: Albert Long Hall Usm.: İlhan Usmanbaş rehearsal and recital room gtr: GitarLive concert space Listening configurations: S: Speaker listening H: Headphones listening Stereo microphone configurations: J: Jecklin disc (baffled pair) D: DIN (near-coincident pair) If we look at the above we can see the majority of the participants have a strong relation to sound recording (either they were sound engineering under-graduates and teachers, or students of music instrument making department). They were not necessarily specialized in stereo recording techniques, though. The analysis of the listening tests turned out to yield some interesting results. For the headphone listening situations, there is dominance of near-coincident pair over baffled pair recordings in overall recording cases. If we investigate further the preference over headphone listening, we can see that the dominance of near-coincident pair preference is restricted to the ALH recording. In Usmanbaş recording, the dominance is on baffled pair (J) and in GitarLive recording; both pairs are preferred in equal numbers (4/4). For the speaker listening setup, however, the dominance is on the baffled pair. There is obvious dominance of baffled pair in the three different recording setups. While for the headphone listening results, the only criteria we can suggest is the involvement of subjects in sound recording; for the speaker listening configuration, we can make some inferences on the selection of room. 33 The listening room was covered with acoustically absorbent material in a great amount, that way the reflections were highly absorbed or reduced from the walls. There were wooden plates on the corners of the room, anyway, and the floor was parquet with a carpet on it. Also there were 12 wooden/metal chairs in the room, which also has a good potential of reflecting sound. 34 Table 4 .3 : Res ults of the lis tening test at M.T. Studio 35 Figure 4.1 : M.T. Studio listening room Listening Nr. 2 – MIAM Studio control room 4.1 The studio of MIAM (Centre for Advanced Studies in Music) at İstanbul Technical University has a well-equipped and professional studio with a professional acoustical treatment and high quality monitoring equipment. The listening tests, conducted at the room were run from a Mac computer, running Pro Tools HD software; through Apogee Symphony interface and through ATC SCM200ASL Pro speakers (Figure 4.2). The speakers are far-field monitors. For headphone listening configuration, Sennheiser HD600 was used. The subjects, chosen for the listening tests, were both graduate students of sound engineering department (who are involved in sound recording in a professional level) and of ethnomusicology (weak relation to sound recording). The sound engineering students also have experience in stereo recording. The listening test was conducted on 10 subjects in the room. Results can be seen on Table 4.2. In the listening test results of MIAM studio control room, we can observe a dominance of baffled pair (Jecklin disc) over near-coincident (DIN) pair on 36 headphone listening configuration. 5 of the 6 graduate sound engineering students have chosen Jecklin disc recording on headphone listening. For the speaker listening, however, near-coincident pair was preferred over baffled pair. As the speakers in the listening room were far field, the subjects were listening from an approximately two-meter distance to the speakers. The speakers were commented by the fellow students and engineers to be over-hyped, i.e., they make the recordings sound better than they really were, on many occasions. The results can be seen on Table 4.4. Figure 4.2 : ATC SCM200ASL Pro speaker (ATC speakers) 37 Table 4.4 : Re sults o f the li stening test co nducte d at M IAM s tudio c ontrol room Nr. R elation cat. Rec H S NOTE S Overa ll S t. W/M ALH Usm. gtr MIAM (10) 28 S Usm. D D A high rate of prefere nce of Jeck lin disk is observ ed in "Stron g" catego ry. H: 3D/ 7J S: 6D/ 4J H : 1D/5 J S: 3D/ 3J H : 2D/2 J S: 3D/ 1J H : 1D/3 J S: 2D/ 2J H : 2D/2 J S: 3D/ 1J H : 0D/2 j S: 1D/ 1J 29 W ALH D D 30 W gtr J D 31 W ALH J J 32 W Usm. D D 33 S ALH J D 34 S gtr J J 35 S Usm. J J 36 S ALH J J 37 S Usm. J D 38 Listening Nr. 3 – DigiLab 4.2 DigiLab is the workspace for sound art and sound experiments in the Centre for Advanced Studies in Music (MIAM) in İstanbul Technical University. It is a medium sized room with two domes in the room with windows at the top, opening up into the air. The windows enable outside noises to enter to the room, even when they are closed. The height of the room is pitched, i.e., it is decreasing from one side to the other. Surface of the walls are mostly concrete, with partial acoustical treatments, such as absorbents and bass traps. The floor is covered in wall-to-wall carpet. The room is not a fully professional listening room that is perfectly suitable for critical listening and detailed monitoring. But it is a room that is used for mostly sound art and acousmatic music composition. The room can be seen at Figure 3.6 (see Chapter 3). The speakers were positioned not in the perfect configuration, the angle from the listener’s head to the speakers were slightly larger than ideal (60 degrees) and the height of the tweeters pointed at the subject’s ears on condition that his/her height is not more than 180cm and that the chair he/she is sitting on is adjusted to the lowest level. With all the conditions mentioned above, the listening room can be considered a non-professional listening room. While all the factors mentioned above may be important on speaker listening, they don’t matter on headphone listening configuration. If we were to analyze the results of the listening test, we can see an obvious dominance of baffled pair configuration over near-coincident pair (see Table 4.3). The listening test that was conducted in DigiLab was noteworthy in that the participants involved were not mostly involved in sound recording or they were involved in it remotely. Two comments, done by fellow sound engineering graduate students, needs to be noted in this section. One comment was that, in the recording that was done with Jecklin disc pair, the subject said that the choir (Usmanbaş recording) sounded located heavily on the right side of the stereophonic panorama. This case was not observed, by the same subject, on the near-coincident pair. Could one inference, made out of this observation, be that the high amount of separation between the channels, Left-Right, made separation larger than it was in real life situation? We should also note that, the excerpt of the recording that was done in 39 Usmanbaş was edited out from a section where the sopranos who were situated on the right side of the choir was singing louder than the rest of the choir according to the arrangement of the piece. Results are shown on Table 4.5. 40 Table 4.5 : Re sults o f the li stening test co nducte d at Di giLab Nr. R elation cat. Rec H S NOTE S O verall St. W/M ALH Usm. gtr DigiLa b (13) 38 S gtr J D Pref erence of DIN over Jecklin on speake r listenin g H : 2D/1 1J S: 8D/ 5J H: 0D/ 4J S: 3D/ 1J H : 2D/7 J S: 5D/ 4J H : 0D/4 J S: 3D/ 1J H : 1D/3 J S: 3D/ 1J H : 1D/4 J S: 2D/ 3J 39 M ALH J D 40 M Usm. D D 41 M gtr J D 42 S Usm. J D 43 S gtr J J 44 S ALH J D 45 W Usm. J D 46 W gtr J J 47 W Usm. J J 48 W ALH J J 49 W gtr D J 50 M ALH J D 41 CONCLUSION 5. In this chapter, the results of the study and their relation to theory are discussed. Deductions on the appliance of theory is discussed on varying recording and listening rooms, microphone polar patterns, stereo configurations. At the end, further possibilities for future work are discussed. Main Idea 5.1 Main idea of the study was to test the two selected stereo recording microphone configurations’ effects on listeners’ subjective preferences; and to come up with a conclusion on the application of the selected stereo microphone configurations in live recording occasions, drawing upon the literature on stereophonic sound studies. The study is based mostly on a practical approach to real-life situations of capturing acoustical (they can be aided by amplified sound sources) sound events which take place in various enclosed areas, such as: rehearsal rooms, concert spaces, living rooms, etc. As is the case in live occasions, the places for the recording of selected samples were chosen from the ones that are most frequently used by performing ensembles. Also the listening rooms were selected with the general audience in mind. They were chosen from a variety of listening rooms, with varying degrees in amount of acoustic treatment. According to the writer’s opinion, with the digitization of people’s habits and lifestyles, the mass media is disintegrating into social media; and the media materials consumed by the general public are becoming more and more “home-made” which is planned, produced and shared by semi-professionals and amateurs. It was also noted by personal observation that even the production styles in visual and audio creative materials done by professionals are becoming styled in that “home-made” texture. If we were to go back to the starting point of the thesis, since elaborating the discussion further on the listening trends or changing lifestyles would take the subject to a different topic, we can say that media materials which are produced on live occasions 42 – instead of curated (see chapter 2) occasions – are an important part of the general mass today. Conclusions 5.2 The two different stereo arrays were compared in various ways. Firstly, as it was mentioned in Chapter 2, since the spacing of two microphones is equal to the spacing between the two ears of a human being, we have an amount of inter-channel time difference in our recorded signal between two channels, Left-Right, which is approximately the same as when we hear the sounds in real life. When we listen to the recording, which is done by one of the mentioned stereo microphone configurations via headphones, we hear the same amount of inter-channel time difference between the two channels – thus the same amount of inter-aural time difference. Were we to listen to the same recording via speakers, on the other hand, further inter-aural time difference would be added to the perceived sound in the acoustical domain of the recording, because of the difference of our two ears’ distance to each speaker (acoustic cross-talk). In both of the selected stereo pairs, the distance between the microphones were close to human’s ears’ distance to each other. However, to investigate the effects of the situation explained above, we should include other stereo recording configurations in the comparison, which is to be discussed in the next section. As we mentioned above, the two stereo recording pairs both have approximately the same inter-channel time difference between the two channels, yet the inter-channel intensity differences may differ between them as a result of the microphones’ polar patterns and the way intensity difference is obtained. For DIN setup, sub-cardioid pattern microphones are used and the intensity difference between the two channels are obtained by varying amounts rejection of specific frequencies on various angles of the polar pattern. The polar pattern and amount of rejection the pattern enables, is presented and it can be seen on Figure 3.2 in the previous chapter. In baffled pair, however, intensity difference is obtained by a baffling object between the two microphones (Jecklin disc), imitating human head. We can take the difference on frequency responses of the two microphones out of equitation, as the two microphones are very much close in their frequency responses (see Figure 3.1 and Figure 3.2). As we can see in the results of the listening tests, there is a dominance of 43 baffled pair disk configuration on headphone listening setups, could it be an indication of subjective preferability of an intensity difference between the two channels obtained by a baffle? Could this baffle, relate to people’s perception a more natural separation – thus affecting their subjective preference on baffled pair recording over near-coincident recording in headphone listening setup? In the listening test, one subject stated that the choir was heavily located on the right side of the stereophonic panorama in baffled pair recording. This situation was also approved by one more subject, while in the near-coincident pair, there was no such a case observed. Could it be a reason for baffle’s higher amount of separation between the two channels? Another factor to be considered in comparison of the two selected pairs is the inclusion of off-target sounds included in the recordings, as results of microphones’ polar patterns and their configuration. These off-target sounds can be categorized into two categories: room information and extra-musical noise. As it was mentioned above, the rooms were not acoustically treated, neither were they isolated from the noise sources. In all three recordings, there was consistent or accidental noise in the rooms. In one of the recordings, there were even open windows at the rear side of the microphones. Since listening test didn’t include questioning people’s perception commentaries, listeners weren’t asked about their opinions on that factor. Apart from the amount of noise, the acoustical imprint of the room in which the recording is done is more prominent on people’s perception. Due to the use of omni-directional microphones, baffled pair can be said to include more off-target sounds than the near-coincident pair. While it is helpful in re-creating the captured spatial information, it can be a bad characteristic of omni-directional polar pattern in rooms with distracting noise sources or bad acoustics. As is commented by one of the subjects on the two recordings in Albert Long Hall, the baffled pair recording sounded like: “enveloping the listener” while the recording that was done with near-coincident pair “sounded too direct and aimed”. It is difficult to reach a single conclusion as a result of the study, since the parameters are varied. We can say that, a greater number of subjects preferred the recording which was done with baffled pair than the near-coincident setups, on headphone listening setups. On speaker listening conditions, baffled pair was preferred over near-coincident pair in professionally designed and acoustically treated listening rooms, 44 while in semi-amateur listening room, baffled-pair was preferred over near-coincident pair. By analyzing the results, we can also state that, the subjects preferred baffled pair recording to near-coincident pair recording on headphone listening, on the recording that was done in a reverberant concert hall (ALH). In recording of a wide sound source such as Rezonans choir (a very wide choir seated spanning approximately 11meters across the two sides of the stereophonic panorama), subjects preferred the baffled pair to near-coincident pair with %58 on headphone listening while they preferred near-coincident pair with %52 percentage. In the GitarLive recording, in which a classical guitar duo was recorded, the ratio of baffled pair to near-coincident pair is %66 on headphone listening while on speaker listening the dominance is again on the baffled pair with a %53 percentage. In total, the subjects who are related to sound recording strongly (32), preferred baffled pair recording to near-coincident recording with a percentage of %56, while the subjects who are in the medium and weak relation to sound recording category (18) preferred baffled pair recording to near-coincident pair recording with a percentage of %66. If we look at the preference consistency of subjects, we see that 29 out of 50 subjects were consistent in their selection of microphone array on two different listening mediums, i.e., if a subject chose one array (let’s say “baffled pair”) on headphone listening, he/she chose the same pair on the speaker listening setup. Of the 29 “consistent” subjects, 12 people preferred near-coincident pair on both mediums while 17 people chose baffled-pair. For the “inconsistent” subjects, 13 people chose baffled-pair (Jecklin disc) for headphone listening while they picked near-coincident pair (DIN) for speaker listening setups; and 8 people did chose the opposite. Certain deductions can be done by analysis and interpretation of the listening test results: - People preferred baffled pair recordings to near-coincident recordings on headphone listening setups. - In rooms with a favorable reverberation characteristics – or any kind of room characteristics that is desired to be included in the recording – baffled pair microphone configuration can be used for a better capturing of spatial information and effective L-R channel separation. 45 - In rooms with undesired acoustic characteristics or with a high ratio of extra-musical noise to the sound source which is desired to be recorded, near-coincident pairs can be used for less amount of spatial information. - Regardless of sound sources or room types, baffled pair is more favored than near-coincident DIN pair on headphone listening setups. Majority of subjects went for baffled pair on headphone listening, even though they chose near-coincident pair for speaker listening. - Recording which is done with a baffled pair is preferred by people who are not related to sound recording in a high ratio (11 people out of 13). In that case, it can be said that baffled-pair recordings are favorable for non-professional listeners. Problems Related to the Test 5.3 During the test and after analyzing the results, a number of inconveniences that are related to the testing method are discovered. First of all, the listening materials were limited in sound source types - dynamic envelope patterns of sound sources. In the three examples, sound sources were – mostly – continuous or “legato-type” sounds with smooth transitions, such as: choir (which is perceived as similar to a drone sound) and string quartet. The only type of sound that could be considered transient, was GitarLive recordings, in which two classical guitars were recorded. Later in the study, it was discovered that it would be useful to include transient sounds – such as percussion – in the listening examples. Secondly, the excerpts from the musical examples were varying in the performance dynamics levels. For example, in Usmanbaş recording, the excerpt starts with a section in which sopranos of the choir have a dominant role in the partition. This audio example led to the perception of entire recording dominantly panned to the right channel – where sopranos sat during the recording. The case that is just mentioned was pointed out by a subject during the conversation after he did the listening test. While this subject can be a topic for research on its own, it can be considered damaging for objective evaluation of the results of the test. 46 Future Work 5.4 Analysis of the results, and deduction of conclusions on the observed-calculated data, enabled further ideas over improvement of the thesis. The test can be done on a larger scale by means of: - More subjects: A larger number of subjects can increase the resolution of the results. Since we deduct a percentage on the preferences of subjects, we have to reach the percentage from numbers smaller than a hundred. For this reason, the results are “estimated”, in most cases, rather than calculated or reduced. - Recordings on more varied rooms: Recording rooms such as a studio, a noisy room, a concert hall with very good acoustics, etc., could be added to the recording places to reach a more extensive conclusion. - More parameters in commentary style: Subjective questions such as: “How did you find the room information in that recording?” or “Do you think the recording is noisy?” can be asked. These questions may help researcher answer questions such as: “In which way, does the different stereo arrays effect peoples’ subjective opinions? Is it more realistic? Simply sounding better?” Apart from a test based solely on numbers, subjects’ comments and verbal interpretations can be included in a more detailed study. - Further domains and mediums: The study can be expanded on surround recording techniques and VR possibilities (including live spatialization of sound sources) - More stereo pair configuration and better arranged stereophonic listening studios: For comparing, double-times-added inter-aural time difference (AB), vs. one-time-added inter-aural time difference (Jecklin, DIN), we need to include more stereo configurations. - The evaluation categories could be added to the survey, in the guideline of “Subjective assessment of audio quality” of EBU. 47 REFERENCES Bartlett, B., Bartlett, J., (2014), Recording Music on Location: Capturing the Live Performance 2nd edition, New York: Focal Press. Blauert, J., (1999), Spatial Hearing: The Psychophysics of Human Sound Localization 2nd edition, Massachusetts: MIT Press. Blumlein, A.D. (1931). British Patent No. 394,325. Goodner, S., (2018), Understand Open vs. Closed Back Headphones and How Each Affects Audio. Lifewire, Retrieved March 30, 2018. from: https://www.lifewire.com/differences-open-closed-back-headphones-4135434 Griesinger, D. (1985), Proocedings of Audio Engieering Society Society 79th Convention: Spaciousness and Localization in Listening Rooms – How to Make Coincident Recording Sound as Spacious as Spaced Microphone Arrays., U.S.: New York. Griesinger, D. (1987), Proceedings of Audio Engineering Society 82nd Convention: New Perspectives on Coincident and Semi-Coincident Microphone Arrays. United Kingdom: London, March 10-13. Guttenberg, S. (2016), What’s More Accurate: Speakers or Headphones?, CNet, Retrieved March 29, 2018. from: https://www.cnet.com/news/whats-more-accurate-speakers-or-headphones/ Jecklin, J. (1981), A Different Way to Record Classical Music. Journal of the Audio Engineering Society, 29(5), 329-332. King, R. (2017). Recording Orchestra and Other Classical Music Ensembles, New York: Routledge. Kleczkowski, P. (2011), Choosing and Configuring a Stereo Microphone Technique Based on Localisation Curves. Archives Acoustics, 36-2, 347-363 Line Audio Design. (2018) Retrieved January 15, 2018, from http://www.lineaudio.se. Lipshitz, S.P. (1986), Stereo Microphone Techniques: Are the Purists Wrong?, Journal of Audio Engineering Society 34(9), 716-744. 48 Mickiewicz, W. (2004), Proceedings of Audio Engineering Society 116th Convention: Optimization of Microphone Setup for Symphonic Orchestra Recordings During Rehearsal. Germany: Berlin, May 8-11, 2004. Pitt, I. (2010), Auditory Perception, lecture notes, retrieved on 29.01.2018 from https://web.archive.org/web/20100410235208/http://www.cs.ucc.ie/~ianp/CS2511/HAP.html Plenge, G., (1972), On the Differences Between Localization and Lateralization, Journal of the Acoustical Society of America, 1974, 56-3, 944-951. Simonsen, G., Master’s Thesis, (1984), Denmark: Lyngby. Steffen Lepa & Anne-Kathrin Hoklas (2015) How do people really listen to music today? Conventionalities and major turnovers in German audio repertoires, Information, Communication & Society, 18:10, 1253-1268, DOI: 10.1080/1369118X.2015.1037327 Stone, S. (2012), How Headphone Listening is Different. Audiophile Review, Retrieved March 29, 2018. from: http://audiophilereview.com/headphones/how-headphone-listening-is-different.html Streicher, R. & Everest, F.A. (1998), The New Stereo Soundbook. Pasadena, U.S. Williams, M. (1987), Proceedings of Audio Engineering Society 82nd Convention: Unified theory of microphone systems for stereophonic sound recording. United Kingdom: London, March 10-13. Wittek, H. & Thiele G. (2002), Proceedings of Audio Engineering Society 112th Convention:, The recording angle – based on localization curves. Germany: Munich. Wittek, H. (2015). Image Assistant v3.0.2, [Browser Application Software]. Retrieved 18.01.2018. Available from http://ima.schoeps.de/ Url-1 , Retrieved at 29.03.2018. Url-2 < https://www.dpamicrophones.com/mic-university/principles-of-the-a-b-stereo-technique>, Retrieved at 10.01.2018. Url-3 , Retrieved at 29.02.2018. 49 APPENDICES APPENDIX A: Recording experiment and analysis APPENDIX B: Listening order for the listening tes 50 51 APPENDIX A As an extra pre-survey research, a recording test was conducted with same or similar stereo microphone arrays. After the recording, audio examples were analyzed on computer and inter-channel time and intensity differences were calculated for comparison with the studies that is done in field. Details of the recording and analysis can be found below. Recording and Experiment Two set of recordings were done for the experiment. First one with a cowbell and sine waves at 50/100/200/400/800 Hertz, signals emitted through a loudspeaker and second one with an acoustic music ensemble. Though different parts of the room were chosen for the two set of recordings, microphone positions relative to each other and source distance to the pairs were kept identical for the sake of comparability. The arrays that were chosen for the recording can be seen on Table A.1. where β is the angle between the microphones α is the sound source position (Kleczkowski, 2011): Table A.1 : The four stereo microphone configurations. Type Spaced Coincident Near-coincident Baffled Name AB Blumlein DIN Jecklin Disc β 0° 90° 90° 0° Spacing 30cm 0cm 20cm 20cm Polar patterns Cardioid Fig-8 Wide-cardioid Omni-directional Microphone KM 184 C414 CM3 OM1 52 Before going on further, an explanation is necessary on the difference between inter-channel and inter-aural differences. Inter-channel differences are the electrical differences between two channels of a stereo signal while inter-aural differences are their projection on human hearing (Bartlett, 2016). In this section, inter-channel differences of the four stereo arrays are discussed. Recording Cowbell and Sine Waves Test recording with a cowbell and sine waves was done in İTÜ-MIAM Dr.Erol Üçer recording studio. Studio’s live room is a large space (78.5 m2) designed for recording ensembles and chamber orchestras with its large volume. Most of the walls in the studio are covered with special panels that is absorbing some frequencies while diffusing some others. The front wall is covered in a completely absorbent material and is called as “the dead wall” by students, while the opposite wall is called the live wall with its diffuser panels. Their sides of the room are called “dead side” and “live side” relatively. Dead side of the room was selected for the recording for cowbell and sine waves test recording session. The four arrays were facing the dead wall with the sound sources in front of them in one meter distance. Spaced, coincident and near coincident pairs were placed in the same height and same vertical plane (130cm from the floor) while the baffled pair was placed under them with its microphones placed at the same distance to the source as the other three arrays. It was not possible to place the baffled pair in the same vertical plane because of the Jecklin disc. The four arrays are shown on Fig. A.1. To avoid reflections from the walls as much as possible, stack-it gobos were placed behind and at the left side of the pairs so that the source and the arrays were enclosed with gobos (rear & left) and absorbent wall (front & right). The floor was also covered partly with a thin piece of wall-to-wall carpet. Sources were placed so that they were approximately at the midway between baffled pair and the other three arrays in the vertical plane with the height of the cowbell being 115cm and the height of loudspeaker being 95cm and 122cm (its bottom and top relatively). The tweeter of the loudspeaker was aimed at the three arrays’ (Spaced, coincident and near-coincident pairs) diaphragms. Source’s distance to the arrays was one meter. Distance was measured by a rope (its length being one meter) tied to the 53 neck of one of the AKG C414 microphones of the Blumlein pair and below it, a cardboard with a protractor printed on it was placed to measure the positions of sound sources (α) in the frontal 180° region and that region was divided into 19 steps with 10° intervals. This simple setup ensured the azimuthal placement and distance of sources to the arrays. Protractor and rope can also be seen on Figure A.1. The channels were run through Focusrite Scarlett 18i20 USB interface into a Mac Book Pro running Pro Tools 10 software. Identical XLR cables were used for the pairs of each array. Cowbell and sine wave samples were recorded on every 10° steps of the frontal 180° angle. Cowbell was hit by the experimenter, while sine waves were played through Event ALP 5-inch monitor. 50/100/200/400/800 Hertz sine tones were played through the loudspeaker in the same order and were recorded through the four arrays at the same time. Even though the signals were recorded on each 19 steps, only 7 of them were used in the experiment, these are: -90, -60, -30, 0, +30, +60, +90 with minus Figure A.1 : The four arrays 54 angles occupying the area from 90° left of the array to the center 0° position and plus ones doing vice versa. After the recording, the samples were edited out and peak volume calibration was done on them to sustain consistency during playback. No spectral or dynamic processing was applied. Recordings were analyzed on computer and inter-channel intensity and time differences were measured. The results were compared to Image Assistant app’s data on sound source position and IID & ITD relations (Wittek, 2015). The results are presented below for each pair with commentary. Measurement was done on the Pro Tools software by checking the waveforms and measuring the distance between channels by sample values, then converting these values to miliseconds. Intensity differences were measured by analyzing RMS values on Pro Tools’ Native “Gain” plug-in. Transient signals were taken for measurement as sine signals’ data yielded widely varying results. This may be because of the loudspeaker’s inability to produce low frequencies such as 50Hz effectively. In the cases where sine waves were also taken for consideration, it was indicated in the text. * Δt represents the inter-channel time difference while ΔL represents the inter-channel intensity difference. * For the measured values, average of all the recorded samples (cowbell) were given in the table. If sine test signals were used, it is indicated on the table. Spaced pair (AB): Table A.2 : Comparison of data from Wittek and recordings with three source positions. Δt ΔL Wittek Measured Wittek measured 90° 0,87ms 0,90ms 0,5 dB 2,6 dB 60° 0,75ms 0,72ms 0,5 dB 1,8 dB 30° 0,45ms 0,43ms 0,4 dB 1,13 dB 55 For the spaced pair, inter-channel differences were mainly time differences. We can calculate the estimated Δt for the two microphones for a sound source at +90° by the formulae for sound speed: Sound speed / time = 35.400cm / 1000ms thus Δt between channels can be measured as 0,84 milliseconds by this formulae as the spacing between microphones are 30cm. The deviation between the three values of Δt obtained by: calculation, Wittek, measurement (0,84ms/0,87ms/0,90ms relatively) raises questions. This discrepancy between the results may vary because of the differences between: sound temperature, altitude, reflections in the test room, microphone manufacturing differences and miscalculation of the distance between source and microphones or between microphones by the experimenter. These small deviations between values are not in the range of distorting the results of the experiment, though. The comparisons can be seen on Table A.2 and on Table A.3. Table A.3 : Comparison of the data given by Wittek to the measured data for sine and cowbell signals. Coincident pair (XY): As there is no spacing between the diaphragms of two microphones, there is not a time delay and the inter-channel differences are exclusively on intensity. Yet one interesting fact was observed about the phase relationship of the two microphone signals. The phase difference observed to be in a linear ratio with the sound source position. In other words, as the sound source moved from frontal 0° to the 90°, phase relation also seemed to differ from 0° (same phase) to 180° (completely inverted phase) in a linear fashion. When the polar diagram of the microphone “AKG C414 Wittek measured (sin. average) measured (trans. average) 90° more than 20dB 4.88 dB 3.9 dB 60° more than 20dB 13.75 dB 6.9 dB 30° 11,5 dB 6.54 dB 6.54 dB 56 XL II” in figure-of-eight mode is considered, the relationship between the sound source position and phase seems reasonable. Average values for the transient (cowbell) and continuous (sine waves) are shown on the table below. For calculation of the sine signals’ average, all frequencies’ (40/100/200/400/800 Hz) data were taken into consideration. A strange “coincidence” on the recording of the transient test signal for sound source position at 90° was observed when analyzing the test recordings. There was a confusion on localization between channels because the null-points of left and right channels were clashing. In other words, a sound that is coming from +90° side of the array reaches the right-channel microphone with a 45° angle and left-channel microphone with a 135° angle. But as these two angles are identical in sound reception sensitivity according to the figure-of-eight microphone polar pattern, even though the sound was coming from extreme right side of the microphone array (audience perspective), inter-channel intensity difference was in favor of left-channel. 180° is beyond the limits of Stereophonic Recording Angle of the Blumlein array (SRA=75.4°, Sengpiel) and this confusion might be caused by this exceeding limits. Near-coincident Pair (DIN): Near-coincident pair gives us both the intensity and time differences. Lipshitz (1986) states that for 21cm spacing between two ears (or microphones in our case) there is a low-frequency regime up to 800Hz half wavelength of which corresponds to the spacing distance (21cm. In our spacing (20cm) time-delay between the microphones for a sound source at 90° is 560µsecs. According to Lipshitz’s calculation, the low-frequency region of DIN setup is 750Hz. That means, for frequencies up to 750Hz, the time difference between two microphones will be phase differences. These phase differences are what enable us to localize low frequencies in the DIN setup. The linearly increasing phase delay can be observed on the sine signals. At 800 Hz, the delay is more than half the wavelength so it is in the range of time-delay region now. DIN is a near-coincident array and its inter-channel differences are both time and intensity. Polar pattern, frequency response of the microphone and angle between the microphones as well as sound source positions are determining factors on intensity differences between channels. Wide-cardioid pattern microphones are used for the pair as opposed to cardioid pattern microphones but as Bartlett (2016) indicates, the specifications for near-coincident arrays is a guideline or starting point for set-up and 57 can be adjusted according to one’s preferences. Calculated inter-channel level and intensity differences can be found on the table below. We can see an increasing inter-channel time delay as the sound source position’s angle increases. From the sound speed formula (354m/s), we can calculate the inter-channel time differences for the selected sound source angles. When we compare the calculated results to the data provided by the Image Assistant app. and the measured differences, we see some differences. As in other array configurations’ comparison, this small differences may result from different sound source types, small microphone placement derivations or reflections of the test room. Comparisons can be observed on Table A.4 and on Table A.5. Table A.4 : Table Calculated time differences as a result of microphone spacing. distance difference time difference 30° 10cm 0,28ms 60° 17cm 0,48ms 90° 20cm 0,56ms The calculations of inter-channel intensity differences, transient test recordings were taken into consideration although sine test signals showed an increasing inter-channel intensity differences as well. In the four arrays tested, DIN setup’s results yielded to be the most coherent with the data provided by the Image Assistant app. Table A.5 : Comparison of near-coincident pair’s (DIN) data Δt ΔL Wittek measured Wittek measured 90° 0,6ms 0,61ms 7 dB 7.1 dB 60° 0,5ms 0,47ms 5 dB 3.66 dB 30° 0,26ms 0,22ms 2.5 dB 2.8 dB 58 Baffled pair: Baffled pair had to be placed lower than the other three arrays because of physical considerations. It was not possible to place the baffled pair (Jecklin disc as baffle) at the same elevation as the three other arrays while keeping the same distance from the source. It was positioned under the three other arrays, with the disc’s top barely touching the other microphones’ bottom. That way the baffled pair was on the same line of phase-arrival with the other three arrays as much as possible. The baffle has virtually no effect on the sound in lower frequencies. When we analyze the cowbell test recording, we see the same amount of low frequencies (lower than the fundamental 500Hz-area) yet as the sound source position and frequency increases, the inter-channel intensity difference increases as well. If we analyze the sine signals, we are not faced with big differences. When played through headphones, it is not easy to localize a sine signal no matter which frequency it is. For sine wave signals, there is not any inter-channel time difference yet one interesting fact is that the phase difference increase as the sound source angle increases and it reaches a 180° phase difference when the sound source position is 90°. APPENDIX B In the Appendix B, we can see the eight selected listening order combinations out of 24 combinations in total. The listening order was selected by the surveyor with as much variation as possible. For instance, if a listener listens to a the recording that was done in Albert Long Hall, with the combination no.7 (see table below), which is coded as A7 by the researcher, then the next subject listened to another recording Usmanbaş or GitarLive (namely B or C categories) with a number between 1-4. That way, variety was achieved in the listening orders. The listening order can be seen on Table A. 59 Table A.6 : L istenin g orde rs for t he test Order Numbe r Order 1 Order 2 Order 3 Order 4 Order 5 Order 6 Order 7 Order 8 Playba cks 1 . DIN with speake rs 2. Baff led pair w ith speake rs 3. DIN with headph ones 4. Baff led pair w ith headph ones 1. DIN with speake rs 2. Baff led pair w ith speake rs 3. Baff led pair w ith headph ones 4. DIN with headph ones 1. Baff led pair w ith speake rs 2. DIN with speake rs 3. DIN with headph ones 4. Baff led pair w ith headph ones 1. Baff led pair w ith speake rs 2. DIN with speake rs 3. Baff led pair w ith headph ones 4. DIN with headph ones 1. DIN with headph ones 2. Baff led pair w ith headph ones 3. DIN with speake rs 4. Baff led pair w ith speake rs 1. DIN with headph ones 2. Baff led pair w ith headph ones 3. Baff led pair w ith speake rs 4. DIN with speake rs 1. Baff led pair w ith headph ones 2. DIN with headph ones 3. DIN with speake rs 4. Baff led pair w ith speake rs 1. Baff led pair w ith headph ones 2. DIN with headph ones 3. Baff led pair w ith speake rs 4. DIN with speake rs 60 61 CURRICULUM VITAE Name Surname : Mertcan İÇUZ Place and Date of Birth : Balıkesir / 01.07.1991 E-Mail : mertcan.icuz@gmail.com EDUCATION : • B.Sc. : 2014, Hacettepe University, Faculty of Education, English Language Teaching • M.Sc. : 2018, İstanbul Technical University, Center for Advanced Studies in Music (MIAM) – Sound Engineering and Design Program PROFESSIONAL EXPERIENCE AND REWARDS: • Freelance sound engineer (2014- ) • Storytel Audio Book company (2015- ) • GitarLive (2017- ) PUBLICATIONS, PRESENTATIONS AND PATENTS ON THE THESIS: