- •Introduction
- •Background
- •Related work
- •Problem
- •Method
- •Chosen methods
- •Alternative methods
- •Implementation
- •Result
- •Quasi-experiment
- •Experiment
- •Survey (quantitative)
- •Survey (qualitative)
- •Discussion
- •Conclusion
- •Threats to validity
- •Ethics
- •Comparison to related work
- •Future work
- •Appendix
- •Appendix
- •Appendix
- •Appendix
Bachelor Degree Project
Bluetooth audio codecs in a real-time interactive context
Bachelor Degree Project in Information Technology Basic level 30 ECTS
Spring 2023
Gustav Johansson, Mattias Adevåg, Jacob Milton
Supervisor: András Márki
Examiner: Yacine Atif
Abstract
The emergence of Bluetooth Low Energy in combination with optimized coders has made it possible to transfer digital audio at very low bitrates, paving the way for small devices with long lasting batteries. The aim of this study is to compare the audio codecs LC3 and aptX, as well as peoples’ attitude towards audio quality in different contexts. Two open source implementations of the codecs are evaluated in terms of time for execution. Furthermore, the perceived audio quality of low bitrates are subjectively compared in a listening test in combination with a questionnaire regarding peoples’ attitude towards audio quality. The results show that LC3 is capable of delivering satisfying audio quality at very low bitrates, whilst also outperforming aptX. It will be interesting to see how LC3 will affect transmission latency, battery life and overall QoS once it is established in everyday products.
Keywords: Bluetooth Low Energy, BLE, audio codec, LC3, aptX, real-time, interactive context, latency, perceived audio quality
Contents
1 |
Introduction |
1 |
|
|
1.1 |
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
1 |
|
1.2 |
Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
6 |
|
1.3 |
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
8 |
2 |
Method |
10 |
|
|
2.1 |
Chosen methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
10 |
|
2.2 |
Alternative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
11 |
|
2.3 |
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
11 |
3 |
Result |
18 |
|
|
3.1 |
Quasi-experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
18 |
|
3.2 |
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
24 |
|
3.3 |
Survey (quantitative) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
28 |
|
3.4 |
Survey (qualitative) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
33 |
4 |
Discussion |
34 |
|
|
4.1 |
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
34 |
|
4.2 |
Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
35 |
|
4.3 |
Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
36 |
|
4.4 |
Comparison to related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
37 |
|
4.5 |
Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
38 |
A |
Appendix |
I |
|
B |
Appendix |
II |
|
C |
Appendix |
VII |
|
D |
Appendix |
VIII |
|
1 | Introduction
In today’s society wireless communication is constantly evolving and becoming more integrated in our everyday life. Bluetooth Low Energy (BLE) is widely used between small devices, such as: smartphones, earbuds and sensors. This form factor holds specific challenges for developers in regards to latency and throughput.
When it comes to audio in this context, some level of delay is acceptable while there are specific situations when low latency is critical for the participants. Such occasions could be music performances or playing video games. McPherson et al. (2016) experiments on different hardware configurations utilized in music production and uses a standard of 10 ms as a reference for acceptable latency.
A theoretical maximum throughput of BLE with a payload size of 20 bytes represents 236.7 kbit/s and real applications often have less throughput due to hardware and firmware limitations (Tosi et al. 2017). When transferring audio over Bluetooth there is a tradeoff between delay and audio quality and there are several audio codecs available with different features.
Looking at currently available products, the delay is often hard to derive from technical specifications. Instead of reliable data, hyperbolic language is often used to highlight the superiority of the given product. Therefore this paper will set out to compare how different audio codecs perform in this regard.
1.1Background
In this section a set of themes are presented to give some background knowledge of the topics that are discussed and elaborated on throughout this report.
Bluetooth
The Bluetooth technology utilizes the open (free to use) 2.4 GHz band to exchange data wirelessly between two devices over a short range. It has been available since 2000, where the early versions are grouped under the epithet “classic.” The radio band is divided into 40 channels, 2 MHz apart, from 2.4000 GHz to 2.4835 GHz (Woolley 2023). Bluetooth has a wide area of use, e.g.: in a body area network where medical equipment, such as a heart rate sensor, can be used with an
1
e-health application (Tosi et al. 2017); in a Personal Area Network to connect devices that a person carries around (Rashid & Yusoff 2006), like a smartphone, smartwatch or headphones; or in a smart home to connect numerous devices (Tosi et al. 2017).
With the release of version 4.0 BLE was introduced. A primary characteristic of BLE is the focus on low power and energy efficiency. This could apply to a heart rate sensor which has a small battery but the critical requirement to last for a long time, perhaps days or weeks. One way for BLE to handle this is by introducing asymmetrical responsibilities. This means that the device with the more reliable power source, such as a smartphone, will handle more of the energy consuming computations in contrast to a peer device that runs on a coin cell battery. Other features are the support of a broadcast mode that allows one device to send data to multiple receivers simultaneously and a mesh topology where tens of thousands of devices can connect and communicate in a network (Woolley 2023).
BLE uses a layered architecture where each layer encapsulates/decapsulates the information from the layer above/below, as shown in Figure 1.1. The different layers can be grouped into controller, host and application: the controller consists of the link layer and the physical layer, which is usually implemented as a system-on-chip with a radio; the host layer includes several layers and is implemented on the application processor; the host and controller communicate via the host controller interface; and the application layer at the top is the only layer not defined by the Bluetooth specification (Gomez et al. 2012).
Figure 1.1: BLE protocol stack
2
Digital audio
The basis of digital audio revolves around converting analog audio frequencies to the digital domain. Pulse Code Modulation is a common method to achieve this which is done by collecting a number of samples of the sound wave in every second (Figure 1.2), also referred to as the sample rate, where the amplitude of each sample is represented by a set of bits, aka the bit depth. According to the Nyquist theorem (Landau 1967) one will need roughly twice as many samples as the highest frequency that should be represented. Since most humans can not recall audio above 20 kHz, somewhere around 40,000 samples should theoretically be enough for a full presentation of the audible spectrum. As a reference, the Compact Disc (CD) utilizes a sample rate of 44.1 kHz with a bit depth of 16. Hence, in each second 44,100 samples are collected which is well over what is needed to cover the limit of hearing (Hunn 2022).
Figure 1.2: Discrete samples representing a continuous analog signal
The combination of sample rate and bit depth determines the bitrate, which is stated in kilobits per second (kbps). Using the CD as an example the math will work as follows: 44,100 (sample rate) x 16 (bits) = 705.6 kbps for a monaural representation. Hence, this would be times two for stereo: 705.6 x 2 = 1,411.2 kbps. In contrast, the revolution of downloadable music has been successful because of much lower bitrates first introduced by the MP3 codec from the Fraunhofer Institute (Hunn 2022).
BLE audio codecs
Before transmitting audio over Bluetooth a compression is performed to decrease the size of the packets. This is done by applying a codec that will encode and compress the data, and when arriving at the target device this process will be reversed, i.e. the data will be decoded and regenerated to produce an approximate version of the original file (Woolley 2023).
An audio codec recognizes patterns in a series of samples which are referred to as frames. These are measured in milliseconds with a fixed duration and the number of samples in a frame are
3
determined by the sample rate. For example, a frame with the length of 10 ms will contain 480 samples if the sample rate is 48 kHz. A shorter frame size will execute faster but decrease the efficiency of a codec as there are fewer samples available to recognise a pattern, as opposed to a larger frame size which will improve the quality but also increase latency. This introduces a tradeoff between latency and quality when opting for a frame size. The industry has found that around 10 ms will provide good quality at a reasonable latency (Hunn 2022).
A filter bank is utilized in signal processing. The bank has an array of filters that will divide the audio into multiple signals based on frequency as shown in Figure 1.3 (Vetterli 1987). Filters could use perceptual coding which is a method that compares the audio stream to a model of human auditory perception, taking into account what the human ear is capable of apprehending (Hunn 2022). There are typically two classes of filter banks: Modified Discrete Cosine Transform (MDCT) and Quadrate Mirror Filter (QMF). MDCT filter banks offer perfect reconstruction and have filter lengths ranging from one to two times the number of subbands. QMF filter banks do not provide perfect reconstruction, although it is close to flawless. It also has filter lengths much greater than two times the number of subbands, which makes them suitable for applications that only use a few subbands (Gayer et al. 2004).
Figure 1.3: Filtering of an audio signal
The Low Complexity Communication Codec (LC3) specification was developed by Bluetooth SIG to solve the limitations of today’s short-range audio platforms. This codec is an evolution of the MDCT-based coder: Transform Coded Excitation, which is included in the: Enhanced Voice Services codec, although significant modifications have been applied to reduce the latency and complexity (Schnell et al. 2021).
The aptX codec has been available for more than 25 years and was developed by Qualcomm (Qualcomm Technologies International, Ltd. 2018). It uses a four band QMF filter bank to split the audio signal into frequency bands of equal bandwidth (Gayer et al. 2004).
The features of these codecs are presented in Table 1.1 below (Hunn 2022, Qualcomm Technologies International, Ltd. 2018).
4
|
LC3 |
aptX |
aptX HD |
|
|
|
|
|
|
Sampling rate (kHz) |
8, 16, 24, 32, 44.1, 48 |
44.1 |
48 |
|
|
|
|
|
|
Bit depth |
16, 24, 32 |
16 |
24 |
|
|
|
|
|
|
Bitrate (kbps) |
20 |
- 400 |
384 |
576 |
|
|
|
|
|
Table 1.1: |
Table of codec properties |
|
||
Perceived audio quality
Subjective audio quality is the definition of how well humans can apprehend the standard of an audio signal.
Sound is transmissioned as air pressure waves which are collected by the ear. These are then passed on as vibrations via the eardrum to the cochlea. The latter is shaped as a spiral and consists of small hairs connected to nerve endings. Different audio frequencies will cause separate areas of the cochlea to resonate resulting in their respective hairs to vibrate. This causes the nerve endings to send impulses to the brain which evaluates sound of different frequencies, level and timbre (Herre & Dick 2019).
A key component when compressing audio is to exploit limitations of what humans can perceive and reduce irrelevant audio information. The main idea in perceptual audio coding builds on the fact that louder sounds will mask adjacent frequencies (Figure 1.4) making them inaudible for most people. There is also a threshold that defines the minimum Sound Pressure Level (SPL) required by frequencies to be hearable. Any sound that is below the SPL or a masking threshold can be filtered out in order to reduce irrelevant audio (Herre & Dick 2019).
Figure 1.4: SPL threshold and masking
5
