Farina Reproduction of auditorium spatial impression with binaural and stereophonic sound systems


Audio Engineering Society
Convention Paper 6485
Presented at the 118th Convention
2005 May 28 31 Barcelona, Spain
This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration
by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request
and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org.
All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
Reproduction of auditorium spatial
impression with binaural and stereophonic
sound systems
Paolo Martignon1, Andrea Azzali1, Densil Cabrera2, Andrea Capra1, and Angelo Farina1
1
Industrial Engineering Department, Universitą di Parma, Via delle Scienze, 43100 Parma, Italy
paolo.martignon@inwind.it
2
School of Architecture, Design Science and Planning, University of Sydney
Sydney, NSW 2006, Australia
densil@arch.usyd.edu.au
ABSTRACT
Binaural room impulse responses convolved with anechoic recordings are commonly used in auditorium acoustics
design and research. Binaural and stereophonic (O.R.T.F.) room impulse responses, which had been recorded in five
concert auditoria, were used in this study to test the spatial audio quality of four reproduction systems: conventional
stereophony, binaural headphones, stereo dipole, and double stereo dipole. Anechoic music, convolved with the
impulse responses, was reproduced over these systems. The systems were matched as closely as possible to each
other, and to the sound levels that would occur in the auditoria for the musical source. In a subjective test, subjects
rated the room size, sound source distance and realism of the reproduction. The stereo dipole and O.R.T.F.
stereophonic systems appear to work better than the headphone and double stereo dipole systems.
binaural signals. Since localization of sound around the
aural axis depends largely on the highly individual
1. INTRODUCTION
acoustical filtering provided by pinnae, localization is a
primary aspect of this spatial distortion. Nevertheless,
Binaural audio recordings and binaural room impulse
non-individualized binaural recordings are very
responses convolved with anechoic recordings are
convenient, in terms of being easy to obtain through
commonly used in auditorium and room acoustics
room acoustical measurement and computer simulation,
design and research. Without individualization, such
as well as from existing databases. Despite their
recordings and convolutions may be subject to
limitations, they can certainly be helpful in appreciating
substantial spatial distortions when listened to using
the acoustical qualities of auditoria, at least in relative
headphones or other playback systems designed for
Martignon et al. Binaural and stereophonic systems
terms. This study examines three options for presenting Cross-talk cancellation provides an alternative to
audio recordings from concert auditoria in binaural headphones for presenting binaural recordings and
format, as well as conventional stereophonic simulations. Originally proposed in the 1960s [3, 4], this
presentation. It investigates the ability of the audio approach was famously used for auditorium acoustical
reproduction formats to convey sound source distance assessment by Schroeder et al. in 1974 [5]. This
and room size in the context of concert auditoria, and technique reproduces the sound from the two ears of a
rates the subjectively assessed realism of the audio head (or model or simulation thereof) at the two ears of
formats. a listener, using at least two loudspeakers. At a
specified head position, the cross-talk from the right
loudspeaker to left ear, and from the left loudspeaker to
1.1. Two-channel audio formats
right ear, is cancelled by signals from the
complementary loudspeaker. There are limits to this at
This section summarizes key characteristics of the audio
low frequencies, because inter-aural level differences
formats considered in this research project.
are naturally small or negligible. The short wavelengths
at high frequencies can make the listener s head position
1.1.1. Binaural techniques
critical for effective operation. Cross-talk cancellation
also requires an absorbent acoustic environment to be
Dummy head recordings and binaural simulations
effective.
record or predict the sound at the ears, which can then
be reproduced using headphones or other techniques
More recently, a refinement of cross-talk cancellation
including cross-talk canceling loudspeaker systems. A
known as the stereo dipole has been developed,
thorough review of binaural techniques, especially using
investigated and applied. This is a type of cross-talk
headphone presentation, is given by Młller [1]. He
cancellation where the two loudspeakers are located
summarizes the problems of binaural headphone
close together, so as to approximate co-located
techniques as including localization errors around the
monopole and dipole sources. Kirkeby et al. [6] find
cones of confusion (and especially the difficulty in
that this configuration (with a 10 interval between
establishing a frontally localized source), and a lack of
loudspeakers as seen by the listener) minimizes the
response of the system to head movements. While the
ringing artifacts in the cross-talk canceling filters, and
former of these problems can be solved using
expands the area in which the cross-talk cancellation is
individualization, and the latter using head-tracking, the
effective (allowing greater listener head movement, [cf.
present paper is concerned with systems with neither
7]). The cost of closely located sound sources is that the
individualization nor head-tracking. Other authors cite
low frequencies require a great boost, and so cross-talk
inside-the-head localization as a problem, but Młller et
cancellation at low frequencies becomes very
al. [2] find no instances of this in test using a carefully
inefficient. One solution to this problem is to have
calibrated non-individualized binaural headphone
greater separation between low frequency drivers than
system. Headphone equalization is probably the most
high frequency drivers. Another solution is to institute a
subtle key aspect of using a non-individualized binaural
cut-off frequency below which cross-talk cancellation is
headphone system: simply reproducing a dummy head
abandoned, and the loudspeakers merely reproduce the
recording over unequalized headphones means that the
binaural channels without additional processing. The
sound is subject to the manufacturer s designed
present study, which uses stereo dipole, applies both of
frequency response (which is unlikely to be optimized
these solutions.
for binaural reproduction), and subject to effects of both
the dummy head ear and listener s ear effects. One
One clear advantage of the stereo dipole technique over
solution involves compensating for the non-flat transfer
binaural headphones is its ability to generate frontally
function between the headphones and the microphones
located auditory images. Having the loudspeakers at
of the original dummy head used to make the
what is probably the most important position for
recordings. Młller et al. [2] find that the error in
auditory distance perception increases when using non- localization appears to solve this problem. Another
related advantage is that, to the extent that the system
individualized a headphone binaural system (compared
tolerates head movements, the sound field is not locked
to individualized headphone binaural, and to natural
to the head, and so localization may be able to benefit
listening, for source distances of up to 5 m), but they did
from at least small head movements.
not find a systematic shift in perceived distance.
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 2 of 12
Martignon et al. Binaural and stereophonic systems
The double stereo dipole is an extension of the simple microphone, which includes an omnidirectional output
stereo dipole system, with both front and rear stereo channel, was on a boom 1 m ahead of the dummy head.
dipole loudspeaker pairs. This facilitates the impression This configuration and method is described in more
of sound coming from behind the listener. However, detail by Farina and Ayalon [11].
the listener head position becomes critical for this
loudspeaker arrangement, because the desired
The five auditoria used in this study were the large,
interference between front and rear stereo dipoles
medium and small halls in Rome s Parco della Musica,
occurs over a quarter of a wavelength.
Parma s Auditorium Paganini, and Kirishima s Miyama
Conseru in Japan. Two receiver positions were chosen
1.1.2. Conventional stereophony for each auditorium. In every case, the receiver was on
the longitudinal axis of symmetry of the auditorium, and
Conventional two-channel stereophony is perhaps not the source 1 m off this axis, on the stage.
used at all in auditorium acoustics research. However,
it is very commonly used in music reproduction for Room acoustical parameters were extracted from the
entertainment purposes, and there are innumerable selected impulse responses. These included
recordings of musical performances in auditoria made reverberation time (T30), early decay time, clarity index
using various stereophonic microphone techniques. The (C80), speech transmission index, bass ratio, treble
present study uses the O.R.T.F. stereophonic ratio, lateral fraction, and inter-aural cross correlation
microphone array, consisting of two cardioid coefficient (IACC). Octave band values were
microphones separated by 17 cm and by an angle of transformed to single number values using the
110. In a comparison of various stereophonic recommendations in ISO3382 [12]. Strength factor (G)
microphone arrays, Hugonnet and Jouhaneau [8] find was not determined, but the reproduced sound pressure
that coincident techniques (such as XY and MS) yield level (Leq) of each stimulus (see below) was.
the most accurate lateral localization, while closely
spaced techniques (including O.R.T.F.) yield the finest
2.2. Listening room and apparatus
distance discrimination. In another comparison, Ceoen
[9] found a subjective preference for recordings made
The listening room floor was 4.5 m x 3.2 m, with a
using the O.R.T.F. system (these were recordings of an
ceiling height of 4.2 m. Sound absorbing panels were
orchestra in an auditorium), and this preference appears
attached to most of the wall space up to a height of 2 m.
to be due to the configuration s ability to convey the
Absorbers were also suspended near the ceiling, and
spatial impression of the auditorium [10].
placed on the floor. Materials likely to absorb low
frequency sound (such as cardboard panels and boxes)
were included in the room acoustical absorption. The
2. METHOD
measured mid-frequency reverberation time (using the
experiment loudspeakers as sources, and dummy head
2.1. Auditoria and impulse response
in the subject s position as receiver) was 0.2 s, with an
measurements
increase in reverberation time the low frequency range.
Background noise level, with the audio equipment
This study exploits a collection of auditorium impulse
operating, was measured at NCB 25 [13].
responses previously made by Farina and colleagues
[11]. The key characteristic of the selected impulse
The axis of symmetry of the loudspeaker array was not
responses is that the same equipment and procedure was
aligned with the room, nor was the listening position in
used in each case, with the signal gain structures fully
the room s center. Loudspeakers were at a distance of
documented. Measurements had been made using a
1.5 m from the listening position. Prototype Audiolink
dodecahedron loudspeaker plus a subwoofer as the
AL105 loudspeakers were used for the conventional
sound source on stage. The test signal was an
stereophonic pair, ą30 from the median line of
exponential swept sine wave. Equalization had been
symmetry. Genelec S30D reference studio monitors
applied to this signal for a constant spatially averaged
were used for the front stereo dipole, on their sides so
output power from the loudspeaker. A Neumann KU70
that the tweeters were 22 cm apart, the mid-range
dummy head was used as the binaural microphone, with
drivers 43 cm apart, and the woofers 83 cm apart
a pair of Neumann AK40 cardiod microphones in the
(measuring between driver centres). This corresponds
O.R.T.F. configuration for two channel stereophonic
to respective angles of 4, 8, and 16 from the median
recording. In addition, a Soundfield B-format
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 3 of 12
Martignon et al. Binaural and stereophonic systems
line of symmetry (the angle seen by the subject between 2.3. Stimulus generation
loudspeakers is double these values). The rear stereo
dipole pair had QSC AD-S82H loudspeakers, with
A calibrated anechoic recording was used in this project
driver centers separated by 45 cm, corresponding to a 9 so that the reproduced sound pressure levels could be
angle from the midline. realistic. This was of a piano accordion, with a
measurement microphone at a distance of 2.5 m directly
Although different loudspeaker models were used, the in front of the performer. The music was  La ballata di
frequency responses of all systems were matched using MichŁ ( Miky s Ballad ), by Fabrizio de AndrŁ: a
4096 tap inverse filters between 100 Hz and 20 kHz, waltz, with a legato melody and articulated
developed using the algorithm of Kirkeby et al. [14]. accompaniment. The octave band sound pressure levels
One point in favour of this system matching was that the of the source, normalised to 1 m, are shown in Figure 2.
audio content of the experiment was undemanding on The A-weighted Leq of the piano accordion normalized
the loudspeakers, having little low frequency content to 1 m is 80 dB(A). The recording was approximately
and requiring only modest sound pressure levels at the 45 seconds in duration.
listening position. Specifically, inverse filters were
designed: (i) for the conventional stereophonic system
to flatten the frequency response to an omnidirectional
measurement microphone at the listener position; (ii) for
the headphones to flatten the frequency response from
the headphones to the dummy head; and (iii) for the
stereo dipole systems, to provide cross-talk cancellation
from 250 Hz and a flat frequency response between the
binaural channels and dummy head (in the listening
position) from 100 Hz.
Although the room had windows, they were almost
entirely covered with opaque panels, so that the
experiment was conducted in the light of the computer
Figure 2 Octave band equivalent sound pressure level of
monitor, with just a little additional ambient light. Most
the accordion, normalized to a microphone distance of
of the surfaces in the room, at least below a height of
1 m.
2 m, were dark grey or black, and little other than the
experiment computer display was visible to a subject
Impulse responses created using a dodecahedron
once their eyes had adapted to the computer monitor.
loudspeaker are not ideal for use in listening
experiments (convolved with anechoic recordings).
Typical sound sources, such as individual musical
instruments or a human voice, are usually directional,
rather than omnidirectional. An omnidirectional source
will yield a lower direct-to-reverberant energy ratio than
a source directed to the listener in an auditorium,
resulting in reduced clarity for the listener. A second
limitation of dodecahedral loudspeakers is their
sensitivity as a function of frequency and radiation
angle varies substantially due to interference between
the twelve drivers. At high frequencies, the individual
drivers also have their own directivity, resulting uneven
sound radiation. The duration of an anechoic impulse
response from a dodecahedral array is long, determined
by the size of the dodecahedron. Although the room
impulse responses used in this study were made with a
Figure 1 Sketch of the listening room configuration.
dodecahedral loudspeaker (plus subwoofer), some
attempt was made to address these problems. Firstly,
the spatially averaged spectral irregularity of the
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 4 of 12
Martignon et al. Binaural and stereophonic systems
loudspeaker was compensated for by equalising the source-receiver distance in the auditorium (assuming
measurement signal (as mentioned previously). This is direct sound only). This established the playback gain
probably an adequate solution for all but the direct structure for the stereophonic system, such that the
sound. Secondly, the direct sound was addressed by speech and accordion were reproduced in the listening
substituting the measured direct impulse with an ideal room at approximately the same sound pressure levels
direct impulse. In the case of the O.R.T.F. impulse as would have occurred in the auditoria.
responses, this ideal signal was simply a single sample
impulse, which has an almost flat frequency response up
to the Nyquist frequency. For the dummy head the
signal was the 0 anechoic impulse response for that
dummy head. The direct sound of each room impulse
response was measured, using a 256-sample fast Fourier
transform (Blackmann-Harris window, sampling rate of
48 kHz) centered on the first major peak in the impulse
response. The 256 sample ideal signals (with the
impulse peak at the 129th sample) were substituted for
the direct sound, scaled to have the same acoustic
energy as the original 256 samples (measured at 500
Hz). The remaining part of the room impulse responses,
consisting of early reflections and reverberant decay,
was attenuated by 3 dB relative to the direct sound,
thereby producing a simplistic approximation of a sound
source with a directivity index of 3 dB facing the
Figure 3 Comparison between theoretical free field and
listening position.
measured sound levels for various receiver positions in
the five auditoria, at 500 Hz.
Verification of the impulse response relative calibration
was done by examining the relationship between the
While the gains of the three binaural playback systems
direct sound level and source-receiver distance.
could be matched simply by dummy head
Notwithstanding effects of very early reflections,
measurements at the listening position, there is, to some
dissipation of acoustic energy in the air, and variation in
extent, and arbitrary relationship between the
loudspeaker directivity (depending on its orientation),
stereophonic and binaural system gains, because their
the direct sound pressure level at the receiving position
spatial sensitivity is different, and spatial sensitivity
should follow the free field ideal of -6 dB per doubling
varies substantially with frequency in the case of the
of distance. Consistency with this principle was
binaural system. It is certainly possible to match the
examined at 500 Hz (where air dissipation should be
microphone systems for free field sensitivity, or for
negligible, and the loudspeaker omnidirectional), as
diffuse field sensitivity  but these results are quite
illustrated in Figure 3. There is general agreement
different, and in an auditorium the sound-field is at
between measurement and theory, with an rms error of
neither of these extremes. Therefore a simple approach
less than 1 dB, but deviations of up to 2 dB.
to microphone system matching was taken in the
playback system  such that the mean broadband sound
The edited impulse responses (both ORTF and dummy
pressure level difference of equivalent recordings (room
head) were convolved with the anechoic recording of
impulse responses convolved with anechoic speech or
piano accordion, at a constant gain. In order to calibrate
accordion) was 0 dB (standard deviation of 1.2 dB).
the gains of the playback systems in the listening room,
Having some stimuli with somewhat greater or lesser
a 500 Hz octave band noise signal was created with a
sound pressure levels over the binaural systems, relative
known level difference to the anechoic recording
to the stereo system) could influence the subjective
microphone calibration tone. This was convolved with
parameters investigated, and as such was considered to
the direct impulse only of one of auditorium situations
be a useful component in the subjective comparison
(O.R.T.F. format) using the same processing gain
between these systems.
structure as for the music convolutions. The reproduced
sound pressure level of the stereophonic loudspeaker
system was adjusted to match that predicted by the
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 5 of 12
Martignon et al. Binaural and stereophonic systems
question). However, subjects could see and change
their ratings for previous questions at any time. The
question order was randomized between subjects.
The three questions were in Italian (Fig 5), and are
roughly translated as  How large is the room that you
are listening to? ,  How realistic is the sound? and
 How distant is the artist in meters?
A computer screen was positioned directly in front of
the subject (supported by the front stereo dipole
loudspeakers). As well as presenting the response
interface, the screen meant that the subject was almost
always facing the front, which is an advantage for the
loudspeaker based playback systems. The subject s
chair had a small integrated table, on which they
operated a wireless computer mouse.
The subject was not given any information (other than
the sound itself) on which loudspeaker system was
Figure 4 Unweighted Leq of the sound stimuli,
being used for a stimulus. However, subjects were
measured with a dummy head microphone at the listener
instructed by the computer program to put on the
position in the listening room. Initials refer to the
headphones when they switched to a headphone
auditoria (Kirishima, Parma, Rome Large, Rome
stimulus, and to remove the headphones when they
Medium and Rome Small).
switched to a loudspeaker stimulus. Clearly, this meant
that subjects had a heightened awareness of the
2.4. Experiment Procedure
headphone technology, while the loudspeaker systems
were differentiated merely by their sound.
With ten auditorium situations, four audio playback
systems and three response scales, presenting every
Thirty subjects, all with musical backgrounds,
stimulus to every subject was not considered to be
participated in the experiment.
feasible. Instead, each subject assessed five auditorium
situations and two audio systems. The assignment of
the auditorium situations and audio systems for each
subject was done by counterbalancing between subjects.
The experiment was conducted using purpose-written
software. The software presented the ten combinations
of situation and audio system as randomly assigned
buttons across the top of the visual interface (Fig 5).
Pressing one of these buttons (using a wireless mouse)
would cause the sound to play, and pressing another of
them would switch the sound almost immediately to that
of another stimulus, with approximately the same time
Figure 5 Control and response interface for the
in the musical performance. Hence, the subject could
experiment with initial settings.
switch between stimuli whenever desired, listening to
them in any order that they wished as many times as
they wished. The three questions were displayed
throughout the experiment, but only the first question
was available for response until all stimuli received
ratings (similarly, the third question was inactive until a
full set of responses was received for the second
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 6 of 12
Martignon et al. Binaural and stereophonic systems
3. RESULTS
3.1. Auditory Distance Estimates
Analysis of variance (ANOVA) shows a significant
effect for audio system (f=3.86, p=0.0099, df=3) and a
stronger effect for situation (f=11.45, p<0.0001, df=9).
A Scheffe test shows significant mean differences
between binaural headphones and conventional
stereophony (p=0.015), but not between any other pairs
of audio systems. There are significant mean differences
(p<0.05) between 16 of the 45 pairs of situations.
The results (Fig 6) show some match between physical
and estimated distance for all four audio systems. While
the best correlation is found for the double stereo dipole
(Table 1), the smallest rms errors are found for O.R.T.F.
stereophony and the single stereo dipole systems. Using
logarithmic distance units, the stereo dipole system has
smallest rms error. A correlation coefficient is not
sensitive to absolute matches in values, but instead
evaluates the goodness of fit of the data to a straight
line. The rms error measures are sensitive to absolute
deviations, and that using logarithmic units measures
the error proportionate to distance (i.e. it tolerates larger
errors at greater distances). The headphone system
yields the weakest match of estimates to source-receiver
distance, in all three evaluations. The authors favor the
logarithmic unit rms evaluation.
Correlation Rms Rms Error
(r2) Error (log(m))
(m)
O.R.T.F. 0.39 9.3 0.23
Headphones 0.34 14.9 0.28
Stereo Dipole 0.59 9.6 0.19
Double Stereo 0.63 10.3 0.22
Dipole
Table 1 Correlations and rms errors for auditory
distance estimates, with respect to physical source-
receiver distance.
Figure 6 Mean auditory distance estimates for the four
audio systems, shown in relation to the source-receiver
distance of the impulse response measurements.
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 7 of 12
Martignon et al. Binaural and stereophonic systems
Generally the sound pressure level of the stimuli Auditorium length provides some correlation with
decreases with source-receiver distance (as shown in auditory room size ratings, at least for O.R.T.F.
Fig 4). However, in the Kirishima concert hall, the 24 m stereophony (r=0.91 for mean ratings of auditoria).
distance received approximately the same sound There are no other correlations between room
pressure level as the 8 m distance using the O.R.T.F. dimensions (length, width, footprint) and room size
microphone array. For the same pair of positions, the ratings for any audio system. The Rome small hall s
binaural microphone array sees a 3 dB reduction in level size appears to be overestimated for the three binaural
over distance. While these effects are explained by the techniques. Single stereo dipole is not sensitive to the
unusual design of the auditorium (especially the ceiling Rome large hall s greater physical size.
reflection), and the different spatial sensitivity of the
microphone arrays, they create a situation where
auditory distance perception is likely to diverge from
Physical Estimated
veridical, and also is likely to differ for the two audio
Distance Distance
recording systems. The correlations between stimulus
O.R.T.F. 0.46 0.95
sound pressure level and distance are r=-0.74 and
r=-0.69 for the binaural and stereophonic systems
Headphones 0.44 0.85
respectively.
Stereo Dipole 0.33 0.58
Distance estimates are related to the sound pressure
Double Stereo Dipole 0.56 0.86
level of the stimuli, most strongly for conventional
stereophony and the stereo dipole systems. Mid
Table 3 Correlation coefficients (r) between auditory
frequency reverberation time (T30  ranging from 1.8 s
room size ratings and source-receiver distance (physical
to 2.4 s) and inter-aural cross correlation coefficient
and estimated).
(IACC  ranging from 0.12 to 0.48) also are significant
correlates of auditory distance for some of the audio
systems, as shown in Table 2. To some extent, there is an inherent relationship
between room size and source-receiver distance,
because large distances are impossible in small rooms.
This helps to explain the high correlations between
SPL T30 IACC
distance estimates and room size ratings, shown in
O.R.T.F. -0.86 0.67 -0.39
Table 3, for three of the four audio systems. However,
these correlations are higher than the respective
Headphones -0.79 0.58 -0.59
correlations between room size ratings and actual
source-receiver distance. In the case of the O.R.T.F.
Stereo Dipole -0.82 0.34 -0.73
system there is little to distinguish room size ratings
Double Stereo Dipole -0.76 0.44 -0.67
from distance estimates. The largest distinction between
these subjective scales is found for the stereo dipole
Table 2 Correlation coefficients (r) between objective system. Figure 8 compares the ratings for these two
stimulus or room acoustical measurements and auditory systems.
distance estimates.
Table 4 shows correlations between stimulus or room
3.2. Auditory Room Size Ratings acoustical parameters and auditory room size ratings.
For binaural headphones, early decay time (EDT) is the
strongest correlate. For the two stereo dipole systems,
ANOVA shows that room size ratings are significantly
affected by situation (f=6.89, p<0.0001, df=9), but not IACC is the strongest correlate. For conventional
significantly by audio system (f=2.4, p=0.066, df=3). stereophony, the strongest correlate is stimulus SPL  as
Alternatively, an analysis considering auditorium would be expected considering the close relationship
with auditory distance estimates for this audio system 
instead of individual situations shows a significant
but correlation with reverberation time is almost as
effect for auditorium (f=8.47, p<0.0001, df=4), and a
strong.
similarly non-significant effect of audio system. Results
are shown in Figure 7.
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 8 of 12
Martignon et al. Binaural and stereophonic systems
SPL T30 EDT IACC
O.R.T.F. -0.74 0.72 0.57 -0.37
Headphones -0.54 0.54 0.75 -0.69
Stereo Dipole -0.43 0.18 0.32 -0.69
Double Stereo Dipole -0.67 0.45 0.45 -0.79
Table 4 Correlation coefficients (r) between objective
stimulus or room acoustical measurements and auditory
room size ratings.
One striking difference between the room size ratings
for the audio systems is in the results for the smallest
auditorium (Rome Small). This auditorium receives
larger room size ratings for the binaural systems than
for O.R.T.F. stereophony. Kirishima, the second
smallest auditorium, receives smaller room size ratings
for the binaural systems. In terms of the acoustical
parameters, IACC has a large contrast between these
auditoria, with low values for Rome Small (0.14 and
0.15) and high values for Kirishima (0.48 and 0.45).
The ability of the binaural systems to convey this
contrast is inherently greater than the O.R.T.F. system,
and this seems to be reflected in the correlations
between room size ratings and IACC in Table 4.
Figure 8 Comparison between auditory distance
estimates and auditory room size ratings for the
O.R.T.F. stereophonic system and the stereo dipole
Figure 7 Mean auditory room size ratings for the four
system.
audio systems, shown in relation to the physical
auditorium length.
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 9 of 12
Martignon et al. Binaural and stereophonic systems
3.3. Realism Ratings between the real auditoria. In the case of room size
ratings, even though physical room length provides the
best physical correlate for one audio system, it is not
ANOVA shows that situation does not significantly
affect realism ratings (p=0.3), and that audio system known whether such judgments in actual rooms would
significantly affects realism (f=4.15, p=0.0068, df=3). be similarly correlated to room length. The ratings of
realism do not suffer this limitation, assuming that the
Binaural headphones were rated as the least realistic,
actual auditoria would achieve full realism.
and O.R.T.F. stereophony the most realistic (a Scheffe
test shows that these two are significantly different).
Single stereo dipole has a mean realism rating almost as Previous studies of auditory perception of distance and
great as O.R.T.F. stereophony, as shown in Fig 9. room size show that the acoustical features of stimuli
can have a strong effect, sometimes stronger than the
effects of actual distance or room size. With respect to
auditory distance perception in rooms, sound pressure
level and aspects of reverberation (eg direct to
reverberant ratio) can have strong effects. Unusually
long reverberation times yield larger distance estimates
[15, 16].
The weak or non-existent relationships between
auditory room size ratings and actual room size in the
present study are at odds with some previous study
results, which showed that subjects can judge the
physical size of rooms just by listening, at least in some
circumstances [17, 18]. Nevertheless, previous studies
also show that acoustical characteristics (especially
Figure 9 Mean auditory realism ratings for the four reverberation time or reverberation level) can have a
audio systems, ą1 standard error. larger effect on perceived room size than the actual
room size [17, 19, 20]. Since none of the rooms in the
present study were small (all were large or very large),
It is not known how natural sound (in real concert halls)
would be rated for realism. Nevertheless, we could cues for discriminating room size were subtle, maybe
too subtle for the actual room size to be conveyed when
assume that the subjects (who were experienced in
music) were making judgments in reference to their confounded with other differences between the
auditorium situations. With regard to purely acoustic
memories of real concert auditorium sound. Subjects
were asked to imagine themselves in an auditorium, influences on room size perception, the four audio
rather than in a listening room with loudspeaker- systems do not show the same tendencies  suggesting
that further research is needed to understand this area.
reproduced sound. Assuming that these ratings do
reflect experience of reality, then the O.R.T.F.
stereophony and single stereo dipole system succeed There are natural correlations between the main
best in conveying realistic sound to a listener. acoustical cues for distance and room size. A small
room is associated with high sound pressure levels (due
to the reverberation level), and high sound pressure
4. DISCUSSION
levels are also a cue to source proximity (due to the
direct sound dispersion over distance). Reverberance is
As an assessment of four non-individualized two-
associated with large rooms (due to the long mean free
channel audio systems for auditorium simulations, this
path), and also with distant sources (due to the low
study is limited by the fact that judgments of distance
direct to reverberant ratio). Hence, similarities between
and room size have not been made in the actual
auditory distance estimates and room size ratings could
auditoria. Hence, while it seems reasonable to rate
be expected, although previous studies find some
systems based on the accuracy of subjective responses
divergence between these [15, 20].
(e.g. accuracy of auditory distance estimates, in relation
to source-receiver distances), it is not known whether
There are many other limitations to the study, including
auditory distance would be judged accurately were it
the use of a non-anechoic listening room (anechoic
possible to instantly transport blindfolded subjects
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 10 of 12
Martignon et al. Binaural and stereophonic systems
conditions would be ideal for the cross-talk canceling stereo dipole systems both achieved good results in this
systems), the use of different loudspeaker models (even study, the stereo dipole system has an inherent
though the frequency responses of these were matched), advantage over conventional stereophony in this
and the limited number of auditorium situations tested. respect, because it aims to convey the auditorium sound
Nevertheless, the study does yield apparently useful field experienced at the modeled head ears to the
results such as: listener s ears. By contrast, conventional stereophony
aims to reproduce the acoustic impression of the
recorded space using a more approximate technique.
" Binaural headphone systems are less effective than
alternatives for auditorium simulations. Headphones Furthermore, it is not normally used at seat positions in
an auditorium, but instead is used close to the stage,
yield low realism ratings and relatively poor
estimates of distance. This result is striking because near the musical performance.
binaural headphone systems are widely used in
auralization applications.
5. CONCLUSIONS
" The double stereo dipole system is relatively
This study examined the reproduction sound quality of
ineffective. However, the likely explanation of this is
four non-individualized two-channel audio systems for a
that the listener s head was not restrained, so that
solo instrument in five concert auditoria. The main
sound quality and image stability in the high
finding is that the stereo dipole appears to provide the
frequency range could have been degraded by
most plausible reproduction. O.R.T.F. stereophony also
incidental movements.
yields a subjectively rated realistic reproduction, but
fails to distinguish auditory distance from auditory room
" The single stereo dipole system is effective in terms
size perception. This may be related to the apparent
of realism ratings and distance estimation. Of the
influence of IACC on room size ratings in binaural
three binaural systems tested, this appears to be the
systems. The problems with binaural headphone and
best. Not having the rear loudspeakers eliminates the
double stereo dipole reproduction are well understood.
front-back interference problem which degrades the
double stereo dipole at high frequencies. While
distance estimates and realism ratings are most
6. ACKNOWLEDGEMENTS
distinct for single stereo dipole, the basis of these
room size ratings is not clear (but appears to be partly
The authors are grateful for the assistance of Alberto
influenced by IACC).
Amendola, Paolo Bilzi, ASK Industries, Casa della
Musica, and Tommaso Dradi (piano accordion) in this
" The O.R.T.F. stereophonic system yields high ratings research project.
of realism, and appears to be the only system in
which ratings of room size can be related to a
7. REFERENCES
physical variable (room length). However, distance
estimations are less effective than for the stereo
[1] H. Młller,  Fundamentals of binaural technology,
dipole system, and there is scarcely any distinction
Applied Acoustics, Volume 36, Issue 3-4, pp. 171-
between distance estimates and room size ratings for
218, 1992.
the O.R.T.F. system.
[2] H. Młller, M. F. Słrensen, C. B. Jensen, and D.
An important distinction between the audio systems
Hammershłi,  Binaural technique: do we need
studied here and systems designed for entertainment is
individual recordings? J. Audio Eng. Soc., vol. 44,
that the aim was realism, rather than listener enjoyment.
no. 6, pp. 451-469, 1996.
The playback level of these systems was apparently less
than typical playback levels for music entertainment
[3] B. B. Bauer,  Stereophonic earphones and binaural
[21, 22], but instead matched to the sound levels that
loudspeakers, J. Audio Eng. Soc., vol. 9, pp. 148
would have been experienced for the instrument in the
151, 1961.
auditorium situations. Realism may or may not be a goal
of entertainment systems, but it is a key attribute of any
[4] M. R. Schroeder and B. S. Atal,  Computer
audio system to be used in the simulation of acoustic
simulation of sound transmission in rooms, IEEE
spaces for empirical research. While the O.R.T.F. and
Int. Conv. Rec. 7, pp. 150-155, 1963.
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 11 of 12
Martignon et al. Binaural and stereophonic systems
[5] M. R. Schroeder, D. Gottlob, and K. F. Siebrasse, reflectance and background noise on perceived
 Comparative study of European concert halls: auditory distance, Perception, vol. 18, pp. 403-
correlation of subjective preference with geometric 416, 1989.
and acoustic parameters, Journal of the Acoustical
Society of America, vol. 56, no. 4, pp. 1195-1201, [16] D. Cabrera and D. Gilfillan,  Auditory distance
1974.
perception of speech in the presence of noise,
Proc. Int. Conf. on Auditory Display, Kyoto, Japan,
[6] O. Kirkeby, P. A. Nelson and H. Hamada,  The pp. 431-439, 2002.
 stereo dipole  a virtual source imaging system
using two closely spaced loudspeakers, J. Audio
[17]J. Sandvad,  Auditory perception of reverberant
Eng. Soc., vol. 46, no. 5, pp. 387-395, 1998. surroundings, Journal of the Acoustical Society of
America, 105(2), Pt. 2, p. 1193 (paper 3pSP3),
[7] T. Takeuchi, P. A. Nelson, O Kirkeby and H. 1999.
Hamada,  Robustness of the performance of the
 stereo dipole to misalignment of head position,
[18] R. McGrath, T. Waldmann, and M. Fernstrm,
102nd Audio Eng. Soc. Conv., Munich, Preprint
 Listening to rooms and objects, Proceedings of
4464 (I7), 1997. the 16th Audio Eng. Soc. Int. Conf., Rovaniemi,
Finland, pp512-522, 1996.
[8] C. Hugonnet and J. Jouhaneau,  Comparative
spatial transfer function of six different
[19] S. Hameed, J. Pakarinen, K. Valde, and V. Pulkki,
stereophonic systems, 82nd Audio Eng. Soc.  Psychoacoustic cues in room size perception ,
Conv., London, Preprint 2465(H-5), 1987.
Proceedings of the 116th Audio Engineering Society
Convention, Berlin, 2004.
[9] C. Ceoen,  Comparative stereophonic listening
tests, J. Audio Eng. Soc., vol. 20, no. 1, pp. 19-27,
[20] D. Cabrera, D. Jeong, H. J. Kwak and J.-Y. Kim,
1972.
 Auditory room size perception for measured and
modeled rooms, Internoise, Rio de Janiero, 2005.
[10] M. Whr, G. Theile, H.-J. Goeres and A. Persterer,
 Room related balancing technique method for
[21] C. D. Mathers, K. F. L. Lansdowne,  Hearing risk
optimizing recording quality, J. Audio Eng. Soc., to wearers of circumaural headphones: An
vol. 39, no. 9, pp. 623-631, 1991.
investigation. BBC Research Report RD 1979/3.
[11] A. Farina and R. Ayalon,  Recording concert hall [22] Condamines, R.,  Relation between the passband
acoustics for posterity, 24th International Audio
and the preferred listening level for music , EBU
Eng. Soc. Conf. on Multichannel Audio, Banff,
Review, no. 139, pp. 124  127, (June 1973).
Canada, paper no. 38 (2003).
[12] International Organization for Standardization, ISO
3382 (1997), Acoustics Measurement of
reverberation time of rooms with reference to other
acoustical parameters
[13]American National Standards Institute, ANSI
S12.2-1995, Criteria for Evaluating Room Noise.
[14] O. Kirkeby, P. A. Nelson, P. Rubak and A. Farina,
 Design of cross-talk cancellation networks using
fast deconvolution, Audio Eng. Soc. 106th Conv.,
Munich, Germany, Preprint 4916 (J1).
[15] D.H. Mershon, W.L. Ballenger, A.D. Little, P.L.
McMurtry, and J.L. Buchanan,  Effects of room
AES 118th Convention, Barcelona, Spain, 2005 May 28 31
Page 12 of 12


Wyszukiwarka

Podobne podstrony:
Effectiveness of Physiotherapy in Children with?rebral Palsy
2005 12 the Art of Juggling Project Management with Taskjuggler
SHSpec 060 6109C28 Grades of Auditors
Modeling Of The Wind Turbine With A Doubly Fed Induction Generator For Grid Integration Studies
Neubauer Prediction of Reverberation Time with Non Uniformly Distributed Sound Absorption
SHSpec 095 6112C20 Upgrading of Auditors
Dawn Cook [Mammoth Book of Vampire Romance S19] With Friends Like These (html)
2009 08?hesion Bonding Linking Static Applications with Statifier and Ermine
Use of Technology in English Language Teaching and Learning An Analysis
2005 07 Bird Security Secure Email with Thunderbird and Enigmail
2006 09 Jail Time Dedicated Gnome Desktops with Pessulus and Sabayon
Induction of two cytochrome P450 genes, Cyp6a2 and Cyp6a8 of Drosophila melanogaster by caffeine

więcej podobnych podstron