Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
акустика / xie_bosun_spatial_sound_principles_and_applications.pdf
Скачиваний:
158
Добавлен:
04.05.2023
Размер:
28.62 Mб
Скачать

Applications of spatial sound and related problems  719

The loudspeaker-to-receiver response can be optimized with a signal processing method. The path differences among different loudspeakers to the listening position can be compensated for by signal delay. Various signal processing techniques are implemented by DSP. A given equalization and compensation frequently improve the effect at one listening position while degrading it at another. In practice, equalization and compensation for each listening position can be designed, saved, and called for according to requirements. Some compromises should be made if the effects at more than one listening position need to be improved.

Automobile audio is mostly for music reproduction and thus is different from audio for a domestic theater, which is mostly for reproduction with an accompanying picture. Various spatial sound techniques and systems, including two-channel stereophonic sound, Dolby ProLogic, and 5.1- or 7.1-channel sound, have been used for automobile audio. The program materials come from various digital storage media (such as optical disks) or analog or digital radio broadcasting. Multichannel spatial surround sound may also be applicable for automobile audio reproduction. Currently, however, most program materials for multichannel spatial surround sound are intended for reproduction with accompanying picture. The situation may change in the future. The method of upmixing stereophonic or 5.1-channel program materials for multichannel spatial surround sound reproduction is also applicable. Bai and Lee (2010) suggested a combination of non-ideal transmission equalization and upmixing or downmixing of stereophonic or multichannel sound signals for automobile reproduction.

16.2  APPLICATIONS TO VIRTUAL REALITY, COMMUNICATIONS, MULTIMEDIA,AND MOBILE DEVICES

16.2.1  Applications to virtual reality

Virtual environments or virtual reality systems provide users with the feeling of being present in natural environments through computer-controlled artificial surroundings (Blauert et al., 2000). Virtual reality includes virtual visual, auditory, and tactile senses. The interaction and complementarity of multiple pieces of information on the aforementioned aspects strengthen the sense of reality and immersion. For virtual reality applications, a spatial sound system must recreate natural and immersive auditory senses rather than reconstruct a target sound field or binaural pressures accurately. From this point of view, various spatial sound techniques and systems may be applied to virtual reality, depending on application requirements (Hollier et al., 1997).

The hardware of headphone-based dynamic binaural reproduction is relatively simple. In early days, dynamic virtual auditory environment systems for a single user were implemented on a hardware platform composed of a personal computer, a flat panel display, and a head tracker. A system with multiple computer terminals, multiple visual displays, and multiple head trackers may be used for more than one user at a time. Static transaural reproduction with two frontal loudspeakers can only recreate the spatial information in the frontal-hor- izontal plane and allows for a narrow listening region. Therefore, it is suitable for a single user.

Since the 2010s, commercial three-dimensional visual displays have been developed quickly. Virtual reality can be effectively implemented through a combination of a headmounted visual display and a dynamic virtual auditory environment system (Jin et al., 2005). Many products with head-mounted visual displays include a head tracker. When a user walks and turns in the virtual space, the head tracker detects the position and orientation of the user, and the system updates the virtual visual and auditory scenes dynamically, resulting in good immersive senses.

720  Spatial Sound

For large-region virtual reality systems with surround or 3D projection screens, sound reproduction within a large listening region is required to enable multiple users at the same time or allow a user to walk within the virtual space. In this case, spatial sound systems based on sound field reconstruction are relatively appropriate. Two examples include the system using warped B-format Ambisonics by Hollier et al. (1997) and the third-generation CAVE system using WFS by DeFanti et al. (2009).

An important application of virtual reality is virtual training. Unlike actual training, virtual reality offers a safe and low-cost task-training environment. An early example is driving training simulation (Krebber et al., 2000). A virtual car acoustic environment, which is part of the virtual driving environment, requires the following components:

1. An external moving sound source with respect to the driver (e.g., traffic flow with Doppler shift)

2. Fixed engine sound, which depends on engine speed and torque 3. Fixed tire sound, which depends on speed and road conditions 4. Fixed wind noise, which depends on speed

5. Background noises, commands to the driver, and other related elements

Virtual acoustic environment systems dynamically synthesize or call sound signals from pre-recorded sound databases according to a driver’s control maneuvers, then reproduce the sound signals through headphones or loudspeakers with appropriate signal processing. Similar methods can be applied to special training environments, such as virtual aviation, aerospace, and submarine environments (Doerr et al., 2007).

Virtual auditory reality is also applied to various auditory scene displays (Hollier et al., 1997), exhibitions, entertainment (Kan et al., 2005), and the creation of special effects in video/audio program production.

Virtual reality was mainly applied to some professional fields. Since 2010, virtual reality with head-mounted visual display and dynamic virtual auditory environment systems has been applied to consumer fields, such as games, entertainment, media, education, and social intercourse. Therefore, auditory virtual reality has a wide application field.

16.2.2  Applications to communication and information systems

An important purpose of applying spatial sound to speech communication is to improve speech intelligibility. In real life, conversation usually occurs in environments with background noise and multiple speech sources concurrently competing. When target speech sources and other noise or competing speech sources are spatially separated, the hearing system can use the cocktail party effect (Section 1.7.5) to obtain expected information and guarantee speech intelligibility. This ability is attributed to binaural hearing.

However, mono signal transmission is dominantly used in currently available communication systems in which the inability to spatially separate targets and competing sources degrades speech intelligibility. Spatial sound methods can preserve the spatial information of sources or spatially separate the sources by signal processing and thus improve the quality of speech communication (Begault and Erbe, 1994; Drullman and Bronkhorst, 2000). Psychoacoustic experimental results indicate that spatially separating multiple speech sources by VADs enhances speech intelligibility for either full-bandwidth or 4 kHz low-pass (phone quality) speech signals (Begault, 1999).

From the point of auditory perception, various spatial sound techniques are theoretically applicable to speech communication, depending on practical requirements and costs. For speech communication with headphones, VAD is advantageous because its hardware

Applications of spatial sound and related problems  721

is simple, and it requires only two independent signals (and therefore a low bandwidth for signal transmission). In addition, conventional headphone presentation is inclined to cause in-the-head localization and auditory fatigue for a long listening time. Incorporating VAD into speech communication can create natural auditory effects and easy auditory fatigue. For speech communication with loudspeakers, other spatial sound techniques may be needed.

Multiple talkers are present in a teleconferences at the same time (Kang and Kim, 1996; Evans et al., 1997). In addition to improvement in speech intelligibility in teleconferences, immersive and close-to-reality communication services are provided by spatial sound techniques. In remote conferencing, the direct approach to preserving spatial information and improving the intelligibility of transmitted speech is to combine and reproduce the binaural sound signals obtained by artificial-head recording in each meeting room if participants are distributed in two or more separate meeting rooms. Alternatively, the speech of each participant is captured by a microphone, then rendered by static or dynamic binaural synthesis according to a pre-defined spatial distribution and acoustic environment, and finally presented to all participants. Other techniques, such as discrete multichannel sounds, Ambisonics, WFS, and microphone arrays, are also applicable to teleconferencing (Boone and Bruijn, 2003) and create a virtual meeting environment. The DiRACc in Section 7.6 and SAOC in Section 13.5.5 are also applicable to teleconferencing (Herre et al., 2011). Similar applications include telepresence (Hollier et al., 1997), various emergency commands, and telephone systems in which multiple speech sources should be monitored simultaneously.

VADs also contribute to aeronautical communication, and numerous investigations on this application were undertaken by the NASA Ames Research Center (Begault, 1998). The projects are categorized as a combination of VAD applications for speech communication and information orientation. Given that civil aircraft cockpits are characterized by high environmental noise, headphones (aside from speech communication) are used to reproduce air traffic warnings based on which pilots determine target (e.g., other aircraft) directions or identify corresponding visual targets (e.g., radar display) and accordingly take appropriate measures. Applying VADs to aeronautical communication improves speech intelligibility and reduces the search or reaction time of pilots with the help of spatialized auditory warnings. The latter is important for flight safety. Additionally, headphone presentation may be combined with active noise control to reduce pilot exposure to binaural noise.

The mentioned VAD applications in aeronautical communication include auditory-based information display and orientation. In some cases, vision is often superior to hearing in terms of target identification and orientation. However, acoustic information becomes particularly important when a target is out of visual range (e.g., behind the human) or when visual information overload occurs (such as in the case of multiple visual targets). In real life, auditory information often guides visual orientation (Bolia et al., 1999), and goals can be localized through hearing even without visual help (Lokki and Gröhn, 2005). Therefore, revealing target information and orientation is another important application of spatial sound.

Audio navigation systems, which combine global positioning system (GPS) with VADs, reproduce sounds as they are emitted from target directions. These systems are applied primarily in civil or military rescue searches (Kan et al., 2004). A similar method can be used to present various types of spatial auditory information, such as that contained in guidance and information systems for the blind (Loomis et al., 1998; Bujacz et al., 2012) or tourism and museum applications (Gonot et al., 2006).

Monitoring multiple targets (such as different instruments and meters) is often necessary for practice; such targets cause visual overload. In this situation, VADs are used to alleviate the visual burden by transforming part of the visual presentation of spatial information into an auditory presentation (i.e., non-visual orientation). Various forms of sound design that provide useful information are called sonification (Barrass, 2012).

722  Spatial Sound

16.2.3  Applications to multimedia

The discussions in Sections 16.2.1 and 16.2.2 are based on some special applications of spatial sound. In professional applications, various functions, such as communication and virtual reality, may be separately implemented by corresponding equipment. In consumer applications, however, users may prefer multi-functional and integrated equipment.

Multimedia PCs, which are distinguished by integration and interaction, can handle a wide range of information, including audio, video, images, text, and data. Information exchange between computers is also possible through the Internet. Even standard PCs possess these functions, making them ideal platforms for communication, information processing, and virtual reality.

Since the 1990s, multimedia PCs have been an important application field for spatial sound, in addition to cinema, domestic, and automobile applications. Spatial sounds are widely incorporated into the entertainment functions of multimedia PCs. Currently, a multimedia PC is often used to play back various video and audio programs from optical disks and stream media. A common sound card in a PC supports two-channel stereophonic inputs and outputs. Some sound cards also support 5.1, 7.1, or even more channel outputs. Various video/audio playback software supports different audio-coded signals, such as MP3, AAC, Dolby Digital, and DTS. Some video/audio production software has powerful functions for multichannel signal editing, converting, and coding. Combined with an optical disk writer, a video/audio CD, DVD, or BD can be easily made on a multimedia PC. The development of hard disks, the Internet, and cloud computing facilitates the storage and transmission of spatial sound programs. Therefore, multimedia PCs provide an effective and convenient platform for video and audio program production and playback.

For a multimedia PC with a multichannel sound card, audio outputs can be directly reproduced with multichannel active loudspeakers. For a common multimedia PC, loudspeakers or headphones are often used for audio reproduction. For loudspeaker reproduction, two loudspeakers are often arranged on the two sides of the visual display with a small, spanned angle with respect to the listener. In this case, stereophonic expansion in Section 11.9.2 and the virtual reproduction of multichannel sound in Section 11.9.3 are applicable to improve the reproduced effect. For headphone presentation, binaural reproduction in Section 11.9.1 is also applicable.

Another entertainment function of a multimedia PC includes 3D games. VAD is often used in various 3D games on multimedia PCs to recreate spatial auditory effects. VAD has been incorporated into some 3D game software on the Windows platform. To create an authentic auditory effect, head trackers, as well as interactive and dynamic signal processing, can also be incorporated into the multimedia PC platform (López and González, 1999; Kyriakakis, 1998). A 3D game based on virtual reality (and VAD) is a promising application, as stated in Section 16.2.1. Other spatial sound techniques, such as Dolby Pro Logic IIz, may also be applied to 3D games (Tsingos et al., 2010).

Various applications of spatial sound to virtual reality, communication and information systems, and receivers of digital multimedia broadcasting can be implemented on multimedia PC platforms. A multimedia PC is also used as a teleconferencing terminal.

Multimedia applications also raise new requirements for the coding and transmission of spatial sound signals. The MPEG-4 coding standard in Section 13.5.4 is specified for multimedia video and audio.

16.2.4  Applications to mobile and handheld devices

Mobile communication and handheld sound reproduction devices, such as tablet computers, smartphones, and stream media players, have rapidly developed in recent years. From a practical perspective, using spatial sound for these types of products is a promising direction.

Applications of spatial sound and related problems  723

Since the 2010s, some corporations and research institutes have already launched relevant studies (AES Staff Technical Writer, 2006a; Yasuda et al., 2003; Paavola et al., 2005; Choi et al., 2006; Sander et al., 2012), and many commercial products have been introduced. Mobile products are characterized by a combination of functions, such as speech communication, interactive virtual auditory environments, teleconferencing, spatial auditory information presentation (e.g., traffic directions), and entertainment (e.g., video-audio reproduction, 3D games). Therefore, such products can be regarded as an application of multimedia technology. The increased speed and bandwidth of wireless communication networks favor the likelihood of carrying out the aforementioned functions. The application of spatial sound to mobile and handheld devices has been considered in the standard of MPEG-H 3D Audio (Section 13.5.6).

Compared with other uses, sound reproduction in mobile and handheld devices is restricted by the following two issues:

1. The limited processing and storage ability of the system, which requires simplification of algorithms and data,

2. The limited power supply by battery requires a reproduction method with low power consumption.

For sound reproduction in mobile devices, mini loudspeakers can be used, but this method may cause some problems. First, this method is unable to create a high-pressure level in reproduction due to the limited power supply in mobile devices and the restrictions imposed by the characteristics of mini loudspeakers. Second, the audio quality of mini loudspeakers is limited, especially at low frequencies. Third, the span between two loudspeakers in a mobile device is small (usually a few centimeters to a dozen centimeters). A mobile device is usually located 20–50 cm away from the listener. Accordingly, the spanned angle of two mini-loudspeakers with respect to the listener lies between 10° and 20°. Such a narrow span angle spoils the stereophonic sound effect.

The first and second problems mentioned have yet to be solved, but they may be changed with technical development. The third problem can be alleviated, and the effect can be improved by using the method of stereophonic expansion in Section 11.9.2 or the virtual reproduction of multichannel sound in Section 11.9.3 (Park et al., 2006; Breebaart et al., 2006). For 3D game use, the transaural method in Section 11.8 can also be utilized directly to create signals for two mini-loudspeakers from mono stimuli.

The above method deals with transaural reproduction via two loudspeakers with a narrow span angle. The analysis in Section 11.8.3 indicates that a loudspeaker configuration with a narrow-spanned angle requires a large boost at a low frequency in transaural processing; consequently, signal processing becomes difficult. Considering that the low-frequency limit of a mini-loudspeaker is 200–300 Hz at best, these low-frequency components can be filtered out in the design of transaural filters, and the difficulty in signal processing can be avoided. Moreover, near-field HRTFs can be used for transaural filters to adapt the practical distance in mobile device reproduction (Zhang et al., 2014).

Headphone presentation requires a relatively small power supply and usually provides better perceived audio quality. Therefore, it is appropriate for mobile devices. However, headphone presentation of stereophonic and multichannel sound signals directly may cause the problem of in-head-localization. To solve this problem, the binaural reproduction method in Section 11.9.1 can be used to convert the stereophonic and multichannel sound signals for headphone presentation on mobile devices. In particular, combining audio coding and binaural processing, the MPEG spatial audio coding and decoding (Section 13.4.5) reduces the bit rate of data and simplifies binaural synthesis processing (Breebaart et al., 2006). For