The binaural speech enhancement results using the system described in the paper " Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication" by J. Li et al. (Speech Communication, 2010) are given in the following demonstrations. For more information, please contact Dr. Junfeng Li via junfeng@jaist.ac.jp.
Note: In the current implementation of Two-Stage BinAural Speech Enhancement (TS-BASE), we exploited the Wiener Filter to enhance the target speech components, therefore, the current version of our proposed binaural signal processing system is named as TS-BASE/WF. The meaning of TS-BASE/WF can be explained as in the following way:
The demonstrations shown here are some example resultant sound files in our binaural speech enhancement or binaural noise reduction research. Though the current attention of this research is paid to its application in improving the ability of hearing aids, it is not limited to.
In the experiments, the target signals and interfering signals were selected from NTT-AT database, and the impulse response were measured using a KEMAR dummy head, download from MIT Media Lab [8]. The binaural target and interfering signals on two ears were obtained by filtering the original signals with the corresponding impulse response in a given direction. The binaural target and interfering signals were then downsampled to 16kHz and finally mixed to generate the observed binaural mixture signals at 0 dB on two ears.
In the following demonstrations, the 1st column describes the direction-of-arrival vectors of sound sources. In each vector, corresponding to one acoustic condition, the first element of the vector is the direction-of-arrival of the target signal, and the other elements are those of the interfering signals.
The 2nd and 3rd columns are the target signal and the interfering noise signal, respectively. The 4th column is the noisy mixture signal. The 5th column is the enhanced signal using the algorithm suggested by two-channel spectral subtraction (TwoChSS) [5], the 6th column is that by frequency domain binaural model (FDBM) [6], the 7th column is that by Roman's algorithm [7] and the 8th column is the enhanced signal by our proposed Two-Stage BinAural Speech Enhancement (TS-BASE) algorithm.
In the demonstrations, different acoustic conditons, characterized by different sound positions, are simulated and used to evaluate the performance of the proposed binaural speech enhancement algorithm. The demonstrations in each acoustic condition are listed in one row. In each row, the 1st sub-row gives the corresponding signals on the left ear, to show the performance of the proposed algorithm in reducing interfering signals and/or enhancing target signal. The 2nd sub-row is the corresponding binaural signals on two ears, which are used to show the ability of the binaural signals in localizing the sound sources. Note that since Roman's algorithm produces the monaural output, therefore, the corresponding 2nd sub-row is denoted as "—"in the following demonstrations. To perceive the location of the sound source, you are strongly recommended to use the headphones when listening.
Demonstration 1
In the first demonstration, the target and interfering signals are supposed with fixed direction-of-arrival, that is, they are fixed at some fixed positions in the space.
Here six acoustic conditions, characterized by different sound positions, are simulated and each is denoted with a vector of direction-of-arrival. As a example , the vector (0, 45, -45) indicates the acoustic condition with the following configuration: target signal is from 0 degree (i.e., the front of KEMAR) and two interfering sources are active at the direction of 45 degree (the right) and -45 degree (the left) .
Description |
Target |
Noise | Mixture | TwoChSS | FDBM | Roman | TS-BASE |
(0, 45) |
|
|
|||||
— | |||||||
(0, 45, -45) | |||||||
— | |||||||
(0, 45, -45, 60) | |||||||
— | |||||||
(45, 0) | |||||||
— | |||||||
(45, 0, -45) | |||||||
— | |||||||
(45, 0, -45, 60) | |||||||
— |
Demonstration 2
In the second demonstration, the target and interfering signals are supposed with time-varying direction-of-arrival, that is, they are moving in the space.
Here two acoustic conditions, characterized by different sound positions, are simulated. In the first condition, described as (0, -90:10:90), the target signal is fixed in the front of the KEMAR dummy head (i.e., 0 degree), and the interfering signal moves from -90 degree (the left) to 90 degree (the right). In the second condition, described as (-90:10:90, 0), the target signal moves from -90 degree ( the left) to 90 degree (the right), and the interfering signal is fixed in the front of the KEMAR dummy head (i.e., 0 degree). The signals on the left ear are given in the first sub-row and the corresponding binaural signals on two ears are in the second sub-row.
Description |
Target |
Noise | Mixture | TwoChSS | FDBM | Roman | TS-BASE |
(0, -90:10:90) |
|
||||||
— | |||||||
(-90:10:90, 0) | |||||||
— |
References