
- •Preface
- •Introduction
- •1.1 Spatial coordinate systems
- •1.2 Sound fields and their physical characteristics
- •1.2.1 Free-field and sound waves generated by simple sound sources
- •1.2.2 Reflections from boundaries
- •1.2.3 Directivity of sound source radiation
- •1.2.4 Statistical analysis of acoustics in an enclosed space
- •1.2.5 Principle of sound receivers
- •1.3 Auditory system and perception
- •1.3.1 Auditory system and its functions
- •1.3.2 Hearing threshold and loudness
- •1.3.3 Masking
- •1.3.4 Critical band and auditory filter
- •1.4 Artificial head models and binaural signals
- •1.4.1 Artificial head models
- •1.4.2 Binaural signals and head-related transfer functions
- •1.5 Outline of spatial hearing
- •1.6 Localization cues for a single sound source
- •1.6.1 Interaural time difference
- •1.6.2 Interaural level difference
- •1.6.3 Cone of confusion and head movement
- •1.6.4 Spectral cues
- •1.6.5 Discussion on directional localization cues
- •1.6.6 Auditory distance perception
- •1.7 Summing localization and spatial hearing with multiple sources
- •1.7.1 Summing localization with two sound sources
- •1.7.2 The precedence effect
- •1.7.3 Spatial auditory perceptions with partially correlated and uncorrelated source signals
- •1.7.4 Auditory scene analysis and spatial hearing
- •1.7.5 Cocktail party effect
- •1.8 Room reflections and auditory spatial impression
- •1.8.1 Auditory spatial impression
- •1.8.2 Sound field-related measures and auditory spatial impression
- •1.8.3 Binaural-related measures and auditory spatial impression
- •1.9.1 Basic principle of spatial sound
- •1.9.2 Classification of spatial sound
- •1.9.3 Developments and applications of spatial sound
- •1.10 Summary
- •2.1 Basic principle of a two-channel stereophonic sound
- •2.1.1 Interchannel level difference and summing localization equation
- •2.1.2 Effect of frequency
- •2.1.3 Effect of interchannel phase difference
- •2.1.4 Virtual source created by interchannel time difference
- •2.1.5 Limitation of two-channel stereophonic sound
- •2.2.1 XY microphone pair
- •2.2.2 MS transformation and the MS microphone pair
- •2.2.3 Spaced microphone technique
- •2.2.4 Near-coincident microphone technique
- •2.2.5 Spot microphone and pan-pot technique
- •2.2.6 Discussion on microphone and signal simulation techniques for two-channel stereophonic sound
- •2.3 Upmixing and downmixing between two-channel stereophonic and mono signals
- •2.4 Two-channel stereophonic reproduction
- •2.4.1 Standard loudspeaker configuration of two-channel stereophonic sound
- •2.4.2 Influence of front-back deviation of the head
- •2.5 Summary
- •3.1 Physical and psychoacoustic principles of multichannel surround sound
- •3.2 Summing localization in multichannel horizontal surround sound
- •3.2.1 Summing localization equations for multiple horizontal loudspeakers
- •3.2.2 Analysis of the velocity and energy localization vectors of the superposed sound field
- •3.2.3 Discussion on horizontal summing localization equations
- •3.3 Multiple loudspeakers with partly correlated and low-correlated signals
- •3.4 Summary
- •4.1 Discrete quadraphone
- •4.1.1 Outline of the quadraphone
- •4.1.2 Discrete quadraphone with pair-wise amplitude panning
- •4.1.3 Discrete quadraphone with the first-order sound field signal mixing
- •4.1.4 Some discussions on discrete quadraphones
- •4.2 Other horizontal surround sounds with regular loudspeaker configurations
- •4.2.1 Six-channel reproduction with pair-wise amplitude panning
- •4.2.2 The first-order sound field signal mixing and reproduction with M ≥ 3 loudspeakers
- •4.3 Transformation of horizontal sound field signals and Ambisonics
- •4.3.1 Transformation of the first-order horizontal sound field signals
- •4.3.2 The first-order horizontal Ambisonics
- •4.3.3 The higher-order horizontal Ambisonics
- •4.3.4 Discussion and implementation of the horizontal Ambisonics
- •4.4 Summary
- •5.1 Outline of surround sounds with accompanying picture and general uses
- •5.2 5.1-Channel surround sound and its signal mixing analysis
- •5.2.1 Outline of 5.1-channel surround sound
- •5.2.2 Pair-wise amplitude panning for 5.1-channel surround sound
- •5.2.3 Global Ambisonic-like signal mixing for 5.1-channel sound
- •5.2.4 Optimization of three frontal loudspeaker signals and local Ambisonic-like signal mixing
- •5.2.5 Time panning for 5.1-channel surround sound
- •5.3 Other multichannel horizontal surround sounds
- •5.4 Low-frequency effect channel
- •5.5 Summary
- •6.1 Summing localization in multichannel spatial surround sound
- •6.1.1 Summing localization equations for spatial multiple loudspeaker configurations
- •6.1.2 Velocity and energy localization vector analysis for multichannel spatial surround sound
- •6.1.3 Discussion on spatial summing localization equations
- •6.1.4 Relationship with the horizontal summing localization equations
- •6.2 Signal mixing methods for a pair of vertical loudspeakers in the median and sagittal plane
- •6.3 Vector base amplitude panning
- •6.4 Spatial Ambisonic signal mixing and reproduction
- •6.4.1 Principle of spatial Ambisonics
- •6.4.2 Some examples of the first-order spatial Ambisonics
- •6.4.4 Recreating a top virtual source with a horizontal loudspeaker arrangement and Ambisonic signal mixing
- •6.5 Advanced multichannel spatial surround sounds and problems
- •6.5.1 Some advanced multichannel spatial surround sound techniques and systems
- •6.5.2 Object-based spatial sound
- •6.5.3 Some problems related to multichannel spatial surround sound
- •6.6 Summary
- •7.1 Basic considerations on the microphone and signal simulation techniques for multichannel sounds
- •7.2 Microphone techniques for 5.1-channel sound recording
- •7.2.1 Outline of microphone techniques for 5.1-channel sound recording
- •7.2.2 Main microphone techniques for 5.1-channel sound recording
- •7.2.3 Microphone techniques for the recording of three frontal channels
- •7.2.4 Microphone techniques for ambience recording and combination with frontal localization information recording
- •7.2.5 Stereophonic plus center channel recording
- •7.3 Microphone techniques for other multichannel sounds
- •7.3.1 Microphone techniques for other discrete multichannel sounds
- •7.3.2 Microphone techniques for Ambisonic recording
- •7.4 Simulation of localization signals for multichannel sounds
- •7.4.1 Methods of the simulation of directional localization signals
- •7.4.2 Simulation of virtual source distance and extension
- •7.4.3 Simulation of a moving virtual source
- •7.5 Simulation of reflections for stereophonic and multichannel sounds
- •7.5.1 Delay algorithms and discrete reflection simulation
- •7.5.2 IIR filter algorithm of late reverberation
- •7.5.3 FIR, hybrid FIR, and recursive filter algorithms of late reverberation
- •7.5.4 Algorithms of audio signal decorrelation
- •7.5.5 Simulation of room reflections based on physical measurement and calculation
- •7.6 Directional audio coding and multichannel sound signal synthesis
- •7.7 Summary
- •8.1 Matrix surround sound
- •8.1.1 Matrix quadraphone
- •8.1.2 Dolby Surround system
- •8.1.3 Dolby Pro-Logic decoding technique
- •8.1.4 Some developments on matrix surround sound and logic decoding techniques
- •8.2 Downmixing of multichannel sound signals
- •8.3 Upmixing of multichannel sound signals
- •8.3.1 Some considerations in upmixing
- •8.3.2 Simple upmixing methods for front-channel signals
- •8.3.3 Simple methods for Ambient component separation
- •8.3.4 Model and statistical characteristics of two-channel stereophonic signals
- •8.3.5 A scale-signal-based algorithm for upmixing
- •8.3.6 Upmixing algorithm based on principal component analysis
- •8.3.7 Algorithm based on the least mean square error for upmixing
- •8.3.8 Adaptive normalized algorithm based on the least mean square for upmixing
- •8.3.9 Some advanced upmixing algorithms
- •8.4 Summary
- •9.1 Each order approximation of ideal reproduction and Ambisonics
- •9.1.1 Each order approximation of ideal horizontal reproduction
- •9.1.2 Each order approximation of ideal three-dimensional reproduction
- •9.2 General formulation of multichannel sound field reconstruction
- •9.2.1 General formulation of multichannel sound field reconstruction in the spatial domain
- •9.2.2 Formulation of spatial-spectral domain analysis of circular secondary source array
- •9.2.3 Formulation of spatial-spectral domain analysis for a secondary source array on spherical surface
- •9.3 Spatial-spectral domain analysis and driving signals of Ambisonics
- •9.3.1 Reconstructed sound field of horizontal Ambisonics
- •9.3.2 Reconstructed sound field of spatial Ambisonics
- •9.3.3 Mixed-order Ambisonics
- •9.3.4 Near-field compensated higher-order Ambisonics
- •9.3.5 Ambisonic encoding of complex source information
- •9.3.6 Some special applications of spatial-spectral domain analysis of Ambisonics
- •9.4 Some problems related to Ambisonics
- •9.4.1 Secondary source array and stability of Ambisonics
- •9.4.2 Spatial transformation of Ambisonic sound field
- •9.5 Error analysis of Ambisonic-reconstructed sound field
- •9.5.1 Integral error of Ambisonic-reconstructed wavefront
- •9.5.2 Discrete secondary source array and spatial-spectral aliasing error in Ambisonics
- •9.6 Multichannel reconstructed sound field analysis in the spatial domain
- •9.6.1 Basic method for analysis in the spatial domain
- •9.6.2 Minimizing error in reconstructed sound field and summing localization equation
- •9.6.3 Multiple receiver position matching method and its relation to the mode-matching method
- •9.7 Listening room reflection compensation in multichannel sound reproduction
- •9.8 Microphone array for multichannel sound field signal recording
- •9.8.1 Circular microphone array for horizontal Ambisonic recording
- •9.8.2 Spherical microphone array for spatial Ambisonic recording
- •9.8.3 Discussion on microphone array recording
- •9.9 Summary
- •10.1 Basic principle and implementation of wave field synthesis
- •10.1.1 Kirchhoff–Helmholtz boundary integral and WFS
- •10.1.2 Simplification of the types of secondary sources
- •10.1.3 WFS in a horizontal plane with a linear array of secondary sources
- •10.1.4 Finite secondary source array and effect of spatial truncation
- •10.1.5 Discrete secondary source array and spatial aliasing
- •10.1.6 Some issues and related problems on WFS implementation
- •10.2 General theory of WFS
- •10.2.1 Green’s function of Helmholtz equation
- •10.2.2 General theory of three-dimensional WFS
- •10.2.3 General theory of two-dimensional WFS
- •10.2.4 Focused source in WFS
- •10.3 Analysis of WFS in the spatial-spectral domain
- •10.3.1 General formulation and analysis of WFS in the spatial-spectral domain
- •10.3.2 Analysis of the spatial aliasing in WFS
- •10.3.3 Spatial-spectral division method of WFS
- •10.4 Further discussion on sound field reconstruction
- •10.4.1 Comparison among various methods of sound field reconstruction
- •10.4.2 Further analysis of the relationship between acoustical holography and sound field reconstruction
- •10.4.3 Further analysis of the relationship between acoustical holography and Ambisonics
- •10.4.4 Comparison between WFS and Ambisonics
- •10.5 Equalization of WFS under nonideal conditions
- •10.6 Summary
- •11.1 Basic principles of binaural reproduction and virtual auditory display
- •11.1.1 Binaural recording and reproduction
- •11.1.2 Virtual auditory display
- •11.2 Acquisition of HRTFs
- •11.2.1 HRTF measurement
- •11.2.2 HRTF calculation
- •11.2.3 HRTF customization
- •11.3 Basic physical features of HRTFs
- •11.3.1 Time-domain features of far-field HRIRs
- •11.3.2 Frequency domain features of far-field HRTFs
- •11.3.3 Features of near-field HRTFs
- •11.4 HRTF-based filters for binaural synthesis
- •11.5 Spatial interpolation and decomposition of HRTFs
- •11.5.1 Directional interpolation of HRTFs
- •11.5.2 Spatial basis function decomposition and spatial sampling theorem of HRTFs
- •11.5.3 HRTF spatial interpolation and signal mixing for multichannel sound
- •11.5.4 Spectral shape basis function decomposition of HRTFs
- •11.6 Simplification of signal processing for binaural synthesis
- •11.6.1 Virtual loudspeaker-based algorithms
- •11.6.2 Basis function decomposition-based algorithms
- •11.7.1 Principle of headphone equalization
- •11.7.2 Some problems with binaural reproduction and VAD
- •11.8 Binaural reproduction through loudspeakers
- •11.8.1 Basic principle of binaural reproduction through loudspeakers
- •11.8.2 Virtual source distribution in two-front loudspeaker reproduction
- •11.8.3 Head movement and stability of virtual sources in Transaural reproduction
- •11.8.4 Timbre coloration and equalization in transaural reproduction
- •11.9 Virtual reproduction of stereophonic and multichannel surround sound
- •11.9.1 Binaural reproduction of stereophonic and multichannel sound through headphones
- •11.9.2 Stereophonic expansion and enhancement
- •11.9.3 Virtual reproduction of multichannel sound through loudspeakers
- •11.10.1 Binaural room modeling
- •11.10.2 Dynamic virtual auditory environments system
- •11.11 Summary
- •12.1 Physical analysis of binaural pressures in summing virtual source and auditory events
- •12.1.1 Evaluation of binaural pressures and localization cues
- •12.1.2 Method for summing localization analysis
- •12.1.3 Binaural pressure analysis of stereophonic and multichannel sound with amplitude panning
- •12.1.4 Analysis of summing localization with interchannel time difference
- •12.1.5 Analysis of summing localization at the off-central listening position
- •12.1.6 Analysis of interchannel correlation and spatial auditory sensations
- •12.2 Binaural auditory models and analysis of spatial sound reproduction
- •12.2.1 Analysis of lateral localization by using auditory models
- •12.2.2 Analysis of front-back and vertical localization by using a binaural auditory model
- •12.2.3 Binaural loudness models and analysis of the timbre of spatial sound reproduction
- •12.3 Binaural measurement system for assessing spatial sound reproduction
- •12.4 Summary
- •13.1 Analog audio storage and transmission
- •13.1.1 45°/45° Disk recording system
- •13.1.2 Analog magnetic tape audio recorder
- •13.1.3 Analog stereo broadcasting
- •13.2 Basic concepts of digital audio storage and transmission
- •13.3 Quantization noise and shaping
- •13.3.1 Signal-to-quantization noise ratio
- •13.3.2 Quantization noise shaping and 1-Bit DSD coding
- •13.4 Basic principle of digital audio compression and coding
- •13.4.1 Outline of digital audio compression and coding
- •13.4.2 Adaptive differential pulse-code modulation
- •13.4.3 Perceptual audio coding in the time-frequency domain
- •13.4.4 Vector quantization
- •13.4.5 Spatial audio coding
- •13.4.6 Spectral band replication
- •13.4.7 Entropy coding
- •13.4.8 Object-based audio coding
- •13.5 MPEG series of audio coding techniques and standards
- •13.5.1 MPEG-1 audio coding technique
- •13.5.2 MPEG-2 BC audio coding
- •13.5.3 MPEG-2 advanced audio coding
- •13.5.4 MPEG-4 audio coding
- •13.5.5 MPEG parametric coding of multichannel sound and unified speech and audio coding
- •13.5.6 MPEG-H 3D audio
- •13.6 Dolby series of coding techniques
- •13.6.1 Dolby digital coding technique
- •13.6.2 Some advanced Dolby coding techniques
- •13.7 DTS series of coding technique
- •13.8 MLP lossless coding technique
- •13.9 ATRAC technique
- •13.10 Audio video coding standard
- •13.11 Optical disks for audio storage
- •13.11.1 Structure, principle, and classification of optical disks
- •13.11.2 CD family and its audio formats
- •13.11.3 DVD family and its audio formats
- •13.11.4 SACD and its audio formats
- •13.11.5 BD and its audio formats
- •13.12 Digital radio and television broadcasting
- •13.12.1 Outline of digital radio and television broadcasting
- •13.12.2 Eureka-147 digital audio broadcasting
- •13.12.3 Digital radio mondiale
- •13.12.4 In-band on-channel digital audio broadcasting
- •13.12.5 Audio for digital television
- •13.13 Audio storage and transmission by personal computer
- •13.14 Summary
- •14.1 Outline of acoustic conditions and requirements for spatial sound intended for domestic reproduction
- •14.2 Acoustic consideration and design of listening rooms
- •14.3 Arrangement and characteristics of loudspeakers
- •14.3.1 Arrangement of the main loudspeakers in listening rooms
- •14.3.2 Characteristics of the main loudspeakers
- •14.3.3 Bass management and arrangement of subwoofers
- •14.4 Signal and listening level alignment
- •14.5 Standards and guidance for conditions of spatial sound reproduction
- •14.6 Headphones and binaural monitors of spatial sound reproduction
- •14.7 Acoustic conditions for cinema sound reproduction and monitoring
- •14.8 Summary
- •15.1 Outline of psychoacoustic and subjective assessment experiments
- •15.2 Contents and attributes for spatial sound assessment
- •15.3 Auditory comparison and discrimination experiment
- •15.3.1 Paradigms of auditory comparison and discrimination experiment
- •15.3.2 Examples of auditory comparison and discrimination experiment
- •15.4 Subjective assessment of small impairments in spatial sound systems
- •15.5 Subjective assessment of a spatial sound system with intermediate quality
- •15.6 Virtual source localization experiment
- •15.6.1 Basic methods for virtual source localization experiments
- •15.6.2 Preliminary analysis of the results of virtual source localization experiments
- •15.6.3 Some results of virtual source localization experiments
- •15.7 Summary
- •16.1.1 Application to commercial cinema and related problems
- •16.1.2 Applications to domestic reproduction and related problems
- •16.1.3 Applications to automobile audio
- •16.2.1 Applications to virtual reality
- •16.2.2 Applications to communication and information systems
- •16.2.3 Applications to multimedia
- •16.2.4 Applications to mobile and handheld devices
- •16.3 Applications to the scientific experiments of spatial hearing and psychoacoustics
- •16.4 Applications to sound field auralization
- •16.4.1 Auralization in room acoustics
- •16.4.2 Other applications of auralization technique
- •16.5 Applications to clinical medicine
- •16.6 Summary
- •References
- •Index

Applications of spatial sound and related problems 719
The loudspeaker-to-receiver response can be optimized with a signal processing method. The path differences among different loudspeakers to the listening position can be compensated for by signal delay. Various signal processing techniques are implemented by DSP. A given equalization and compensation frequently improve the effect at one listening position while degrading it at another. In practice, equalization and compensation for each listening position can be designed, saved, and called for according to requirements. Some compromises should be made if the effects at more than one listening position need to be improved.
Automobile audio is mostly for music reproduction and thus is different from audio for a domestic theater, which is mostly for reproduction with an accompanying picture. Various spatial sound techniques and systems, including two-channel stereophonic sound, Dolby ProLogic, and 5.1- or 7.1-channel sound, have been used for automobile audio. The program materials come from various digital storage media (such as optical disks) or analog or digital radio broadcasting. Multichannel spatial surround sound may also be applicable for automobile audio reproduction. Currently, however, most program materials for multichannel spatial surround sound are intended for reproduction with accompanying picture. The situation may change in the future. The method of upmixing stereophonic or 5.1-channel program materials for multichannel spatial surround sound reproduction is also applicable. Bai and Lee (2010) suggested a combination of non-ideal transmission equalization and upmixing or downmixing of stereophonic or multichannel sound signals for automobile reproduction.
16.2 APPLICATIONS TO VIRTUAL REALITY, COMMUNICATIONS, MULTIMEDIA,AND MOBILE DEVICES
16.2.1 Applications to virtual reality
Virtual environments or virtual reality systems provide users with the feeling of being present in natural environments through computer-controlled artificial surroundings (Blauert et al., 2000). Virtual reality includes virtual visual, auditory, and tactile senses. The interaction and complementarity of multiple pieces of information on the aforementioned aspects strengthen the sense of reality and immersion. For virtual reality applications, a spatial sound system must recreate natural and immersive auditory senses rather than reconstruct a target sound field or binaural pressures accurately. From this point of view, various spatial sound techniques and systems may be applied to virtual reality, depending on application requirements (Hollier et al., 1997).
The hardware of headphone-based dynamic binaural reproduction is relatively simple. In early days, dynamic virtual auditory environment systems for a single user were implemented on a hardware platform composed of a personal computer, a flat panel display, and a head tracker. A system with multiple computer terminals, multiple visual displays, and multiple head trackers may be used for more than one user at a time. Static transaural reproduction with two frontal loudspeakers can only recreate the spatial information in the frontal-hor- izontal plane and allows for a narrow listening region. Therefore, it is suitable for a single user.
Since the 2010s, commercial three-dimensional visual displays have been developed quickly. Virtual reality can be effectively implemented through a combination of a headmounted visual display and a dynamic virtual auditory environment system (Jin et al., 2005). Many products with head-mounted visual displays include a head tracker. When a user walks and turns in the virtual space, the head tracker detects the position and orientation of the user, and the system updates the virtual visual and auditory scenes dynamically, resulting in good immersive senses.

720 Spatial Sound
For large-region virtual reality systems with surround or 3D projection screens, sound reproduction within a large listening region is required to enable multiple users at the same time or allow a user to walk within the virtual space. In this case, spatial sound systems based on sound field reconstruction are relatively appropriate. Two examples include the system using warped B-format Ambisonics by Hollier et al. (1997) and the third-generation CAVE system using WFS by DeFanti et al. (2009).
An important application of virtual reality is virtual training. Unlike actual training, virtual reality offers a safe and low-cost task-training environment. An early example is driving training simulation (Krebber et al., 2000). A virtual car acoustic environment, which is part of the virtual driving environment, requires the following components:
1. An external moving sound source with respect to the driver (e.g., traffic flow with Doppler shift)
2. Fixed engine sound, which depends on engine speed and torque 3. Fixed tire sound, which depends on speed and road conditions 4. Fixed wind noise, which depends on speed
5. Background noises, commands to the driver, and other related elements
Virtual acoustic environment systems dynamically synthesize or call sound signals from pre-recorded sound databases according to a driver’s control maneuvers, then reproduce the sound signals through headphones or loudspeakers with appropriate signal processing. Similar methods can be applied to special training environments, such as virtual aviation, aerospace, and submarine environments (Doerr et al., 2007).
Virtual auditory reality is also applied to various auditory scene displays (Hollier et al., 1997), exhibitions, entertainment (Kan et al., 2005), and the creation of special effects in video/audio program production.
Virtual reality was mainly applied to some professional fields. Since 2010, virtual reality with head-mounted visual display and dynamic virtual auditory environment systems has been applied to consumer fields, such as games, entertainment, media, education, and social intercourse. Therefore, auditory virtual reality has a wide application field.
16.2.2 Applications to communication and information systems
An important purpose of applying spatial sound to speech communication is to improve speech intelligibility. In real life, conversation usually occurs in environments with background noise and multiple speech sources concurrently competing. When target speech sources and other noise or competing speech sources are spatially separated, the hearing system can use the cocktail party effect (Section 1.7.5) to obtain expected information and guarantee speech intelligibility. This ability is attributed to binaural hearing.
However, mono signal transmission is dominantly used in currently available communication systems in which the inability to spatially separate targets and competing sources degrades speech intelligibility. Spatial sound methods can preserve the spatial information of sources or spatially separate the sources by signal processing and thus improve the quality of speech communication (Begault and Erbe, 1994; Drullman and Bronkhorst, 2000). Psychoacoustic experimental results indicate that spatially separating multiple speech sources by VADs enhances speech intelligibility for either full-bandwidth or 4 kHz low-pass (phone quality) speech signals (Begault, 1999).
From the point of auditory perception, various spatial sound techniques are theoretically applicable to speech communication, depending on practical requirements and costs. For speech communication with headphones, VAD is advantageous because its hardware

Applications of spatial sound and related problems 721
is simple, and it requires only two independent signals (and therefore a low bandwidth for signal transmission). In addition, conventional headphone presentation is inclined to cause in-the-head localization and auditory fatigue for a long listening time. Incorporating VAD into speech communication can create natural auditory effects and easy auditory fatigue. For speech communication with loudspeakers, other spatial sound techniques may be needed.
Multiple talkers are present in a teleconferences at the same time (Kang and Kim, 1996; Evans et al., 1997). In addition to improvement in speech intelligibility in teleconferences, immersive and close-to-reality communication services are provided by spatial sound techniques. In remote conferencing, the direct approach to preserving spatial information and improving the intelligibility of transmitted speech is to combine and reproduce the binaural sound signals obtained by artificial-head recording in each meeting room if participants are distributed in two or more separate meeting rooms. Alternatively, the speech of each participant is captured by a microphone, then rendered by static or dynamic binaural synthesis according to a pre-defined spatial distribution and acoustic environment, and finally presented to all participants. Other techniques, such as discrete multichannel sounds, Ambisonics, WFS, and microphone arrays, are also applicable to teleconferencing (Boone and Bruijn, 2003) and create a virtual meeting environment. The DiRACc in Section 7.6 and SAOC in Section 13.5.5 are also applicable to teleconferencing (Herre et al., 2011). Similar applications include telepresence (Hollier et al., 1997), various emergency commands, and telephone systems in which multiple speech sources should be monitored simultaneously.
VADs also contribute to aeronautical communication, and numerous investigations on this application were undertaken by the NASA Ames Research Center (Begault, 1998). The projects are categorized as a combination of VAD applications for speech communication and information orientation. Given that civil aircraft cockpits are characterized by high environmental noise, headphones (aside from speech communication) are used to reproduce air traffic warnings based on which pilots determine target (e.g., other aircraft) directions or identify corresponding visual targets (e.g., radar display) and accordingly take appropriate measures. Applying VADs to aeronautical communication improves speech intelligibility and reduces the search or reaction time of pilots with the help of spatialized auditory warnings. The latter is important for flight safety. Additionally, headphone presentation may be combined with active noise control to reduce pilot exposure to binaural noise.
The mentioned VAD applications in aeronautical communication include auditory-based information display and orientation. In some cases, vision is often superior to hearing in terms of target identification and orientation. However, acoustic information becomes particularly important when a target is out of visual range (e.g., behind the human) or when visual information overload occurs (such as in the case of multiple visual targets). In real life, auditory information often guides visual orientation (Bolia et al., 1999), and goals can be localized through hearing even without visual help (Lokki and Gröhn, 2005). Therefore, revealing target information and orientation is another important application of spatial sound.
Audio navigation systems, which combine global positioning system (GPS) with VADs, reproduce sounds as they are emitted from target directions. These systems are applied primarily in civil or military rescue searches (Kan et al., 2004). A similar method can be used to present various types of spatial auditory information, such as that contained in guidance and information systems for the blind (Loomis et al., 1998; Bujacz et al., 2012) or tourism and museum applications (Gonot et al., 2006).
Monitoring multiple targets (such as different instruments and meters) is often necessary for practice; such targets cause visual overload. In this situation, VADs are used to alleviate the visual burden by transforming part of the visual presentation of spatial information into an auditory presentation (i.e., non-visual orientation). Various forms of sound design that provide useful information are called sonification (Barrass, 2012).

722 Spatial Sound
16.2.3 Applications to multimedia
The discussions in Sections 16.2.1 and 16.2.2 are based on some special applications of spatial sound. In professional applications, various functions, such as communication and virtual reality, may be separately implemented by corresponding equipment. In consumer applications, however, users may prefer multi-functional and integrated equipment.
Multimedia PCs, which are distinguished by integration and interaction, can handle a wide range of information, including audio, video, images, text, and data. Information exchange between computers is also possible through the Internet. Even standard PCs possess these functions, making them ideal platforms for communication, information processing, and virtual reality.
Since the 1990s, multimedia PCs have been an important application field for spatial sound, in addition to cinema, domestic, and automobile applications. Spatial sounds are widely incorporated into the entertainment functions of multimedia PCs. Currently, a multimedia PC is often used to play back various video and audio programs from optical disks and stream media. A common sound card in a PC supports two-channel stereophonic inputs and outputs. Some sound cards also support 5.1, 7.1, or even more channel outputs. Various video/audio playback software supports different audio-coded signals, such as MP3, AAC, Dolby Digital, and DTS. Some video/audio production software has powerful functions for multichannel signal editing, converting, and coding. Combined with an optical disk writer, a video/audio CD, DVD, or BD can be easily made on a multimedia PC. The development of hard disks, the Internet, and cloud computing facilitates the storage and transmission of spatial sound programs. Therefore, multimedia PCs provide an effective and convenient platform for video and audio program production and playback.
For a multimedia PC with a multichannel sound card, audio outputs can be directly reproduced with multichannel active loudspeakers. For a common multimedia PC, loudspeakers or headphones are often used for audio reproduction. For loudspeaker reproduction, two loudspeakers are often arranged on the two sides of the visual display with a small, spanned angle with respect to the listener. In this case, stereophonic expansion in Section 11.9.2 and the virtual reproduction of multichannel sound in Section 11.9.3 are applicable to improve the reproduced effect. For headphone presentation, binaural reproduction in Section 11.9.1 is also applicable.
Another entertainment function of a multimedia PC includes 3D games. VAD is often used in various 3D games on multimedia PCs to recreate spatial auditory effects. VAD has been incorporated into some 3D game software on the Windows platform. To create an authentic auditory effect, head trackers, as well as interactive and dynamic signal processing, can also be incorporated into the multimedia PC platform (López and González, 1999; Kyriakakis, 1998). A 3D game based on virtual reality (and VAD) is a promising application, as stated in Section 16.2.1. Other spatial sound techniques, such as Dolby Pro Logic IIz, may also be applied to 3D games (Tsingos et al., 2010).
Various applications of spatial sound to virtual reality, communication and information systems, and receivers of digital multimedia broadcasting can be implemented on multimedia PC platforms. A multimedia PC is also used as a teleconferencing terminal.
Multimedia applications also raise new requirements for the coding and transmission of spatial sound signals. The MPEG-4 coding standard in Section 13.5.4 is specified for multimedia video and audio.
16.2.4 Applications to mobile and handheld devices
Mobile communication and handheld sound reproduction devices, such as tablet computers, smartphones, and stream media players, have rapidly developed in recent years. From a practical perspective, using spatial sound for these types of products is a promising direction.

Applications of spatial sound and related problems 723
Since the 2010s, some corporations and research institutes have already launched relevant studies (AES Staff Technical Writer, 2006a; Yasuda et al., 2003; Paavola et al., 2005; Choi et al., 2006; Sander et al., 2012), and many commercial products have been introduced. Mobile products are characterized by a combination of functions, such as speech communication, interactive virtual auditory environments, teleconferencing, spatial auditory information presentation (e.g., traffic directions), and entertainment (e.g., video-audio reproduction, 3D games). Therefore, such products can be regarded as an application of multimedia technology. The increased speed and bandwidth of wireless communication networks favor the likelihood of carrying out the aforementioned functions. The application of spatial sound to mobile and handheld devices has been considered in the standard of MPEG-H 3D Audio (Section 13.5.6).
Compared with other uses, sound reproduction in mobile and handheld devices is restricted by the following two issues:
1. The limited processing and storage ability of the system, which requires simplification of algorithms and data,
2. The limited power supply by battery requires a reproduction method with low power consumption.
For sound reproduction in mobile devices, mini loudspeakers can be used, but this method may cause some problems. First, this method is unable to create a high-pressure level in reproduction due to the limited power supply in mobile devices and the restrictions imposed by the characteristics of mini loudspeakers. Second, the audio quality of mini loudspeakers is limited, especially at low frequencies. Third, the span between two loudspeakers in a mobile device is small (usually a few centimeters to a dozen centimeters). A mobile device is usually located 20–50 cm away from the listener. Accordingly, the spanned angle of two mini-loudspeakers with respect to the listener lies between 10° and 20°. Such a narrow span angle spoils the stereophonic sound effect.
The first and second problems mentioned have yet to be solved, but they may be changed with technical development. The third problem can be alleviated, and the effect can be improved by using the method of stereophonic expansion in Section 11.9.2 or the virtual reproduction of multichannel sound in Section 11.9.3 (Park et al., 2006; Breebaart et al., 2006). For 3D game use, the transaural method in Section 11.8 can also be utilized directly to create signals for two mini-loudspeakers from mono stimuli.
The above method deals with transaural reproduction via two loudspeakers with a narrow span angle. The analysis in Section 11.8.3 indicates that a loudspeaker configuration with a narrow-spanned angle requires a large boost at a low frequency in transaural processing; consequently, signal processing becomes difficult. Considering that the low-frequency limit of a mini-loudspeaker is 200–300 Hz at best, these low-frequency components can be filtered out in the design of transaural filters, and the difficulty in signal processing can be avoided. Moreover, near-field HRTFs can be used for transaural filters to adapt the practical distance in mobile device reproduction (Zhang et al., 2014).
Headphone presentation requires a relatively small power supply and usually provides better perceived audio quality. Therefore, it is appropriate for mobile devices. However, headphone presentation of stereophonic and multichannel sound signals directly may cause the problem of in-head-localization. To solve this problem, the binaural reproduction method in Section 11.9.1 can be used to convert the stereophonic and multichannel sound signals for headphone presentation on mobile devices. In particular, combining audio coding and binaural processing, the MPEG spatial audio coding and decoding (Section 13.4.5) reduces the bit rate of data and simplifies binaural synthesis processing (Breebaart et al., 2006). For