Demo for TS-BASE

Demonstrations for TS-BASE (Two-Stage BinAural Speech Enhancement)

The binaural speech enhancement results using the system described in the paper " Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication" by J. Li et al. (Speech Communication, 2010) are given in the following demonstrations. For more information, please contact Dr. Junfeng Li via junfeng@jaist.ac.jp.

Note: In the current implementation of Two-Stage BinAural Speech Enhancement (TS-BASE), we exploited the Wiener Filter to enhance the target speech components, therefore, the current version of our proposed binaural signal processing system is named as TS-BASE/WF. The meaning of TS-BASE/WF can be explained as in the following way:

In TS-BASE/WF, the TS (Two-Stage) defines the framework of our proposed system; BASE (BinAural Speech Enhancement) shows the purpose of this system; WF (Wiener Filter) describes the current implementation of this system.
In TS-BASE/WF, its core is assumed as BASE, which is expected to indicate "the proposed TS-BASE system can be a 'base' for the future research on binaural signal processing for hearing aids".

Description

The demonstrations shown here are some example resultant sound files in our binaural speech enhancement or binaural noise reduction research. Though the current attention of this research is paid to its application in improving the ability of hearing aids, it is not limited to.

In the experiments, the target signals and interfering signals were selected from NTT-AT database, and the impulse response were measured using a KEMAR dummy head, download from MIT Media Lab [8]. The binaural target and interfering signals on two ears were obtained by filtering the original signals with the corresponding impulse response in a given direction. The binaural target and interfering signals were then downsampled to 16kHz and finally mixed to generate the observed binaural mixture signals at 0 dB on two ears.

In the following demonstrations, the 1st column describes the direction-of-arrival vectors of sound sources. In each vector, corresponding to one acoustic condition, the first element of the vector is the direction-of-arrival of the target signal, and the other elements are those of the interfering signals.

The 2nd and 3rd columns are the target signal and the interfering noise signal, respectively. The 4th column is the noisy mixture signal. The 5th column is the enhanced signal using the algorithm suggested by two-channel spectral subtraction (TwoChSS) [5], the 6th column is that by frequency domain binaural model (FDBM) [6], the 7th column is that by Roman's algorithm [7] and the 8th column is the enhanced signal by our proposed Two-Stage BinAural Speech Enhancement (TS-BASE) algorithm.

In the demonstrations, different acoustic conditons, characterized by different sound positions, are simulated and used to evaluate the performance of the proposed binaural speech enhancement algorithm. The demonstrations in each acoustic condition are listed in one row. In each row, the 1st sub-row gives the corresponding signals on the left ear, to show the performance of the proposed algorithm in reducing interfering signals and/or enhancing target signal. The 2nd sub-row is the corresponding binaural signals on two ears, which are used to show the ability of the binaural signals in localizing the sound sources. Note that since Roman's algorithm produces the monaural output, therefore, the corresponding 2nd sub-row is denoted as "—"in the following demonstrations. To perceive the location of the sound source, you are strongly recommended to use the headphones when listening.

Demonstration 1

In the first demonstration, the target and interfering signals are supposed with fixed direction-of-arrival, that is, they are fixed at some fixed positions in the space.

Here six acoustic conditions, characterized by different sound positions, are simulated and each is denoted with a vector of direction-of-arrival. As a example , the vector (0, 45, -45) indicates the acoustic condition with the following configuration: target signal is from 0 degree (i.e., the front of KEMAR) and two interfering sources are active at the direction of 45 degree (the right) and -45 degree (the left) .

Description	Target	Noise	Mixture	TwoChSS	FDBM	Roman	TS-BASE
(0, 45)
(0, 45)						—
(0, 45, -45)
(0, 45, -45)						—
(0, 45, -45, 60)
(0, 45, -45, 60)						—
(45, 0)
(45, 0)						—
(45, 0, -45)
(45, 0, -45)						—
(45, 0, -45, 60)
(45, 0, -45, 60)						—

Demonstration 2

In the second demonstration, the target and interfering signals are supposed with time-varying direction-of-arrival, that is, they are moving in the space.

Here two acoustic conditions, characterized by different sound positions, are simulated. In the first condition, described as (0, -90:10:90), the target signal is fixed in the front of the KEMAR dummy head (i.e., 0 degree), and the interfering signal moves from -90 degree (the left) to 90 degree (the right). In the second condition, described as (-90:10:90, 0), the target signal moves from -90 degree ( the left) to 90 degree (the right), and the interfering signal is fixed in the front of the KEMAR dummy head (i.e., 0 degree). The signals on the left ear are given in the first sub-row and the corresponding binaural signals on two ears are in the second sub-row.

Description	Target	Noise	Mixture	TwoChSS	FDBM	Roman	TS-BASE
(0, -90:10:90)
(0, -90:10:90)						—
(-90:10:90, 0)
(-90:10:90, 0)						—

References

J. Li, S. Sakamoto, S. Hongo, M. Akagi and Y. Suzuki, "Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication," Speech Communication, 2010. (In Press)
J. Li, S. Sakamoto, S. Hongo, M. Akagi and Y. Suzuki, " Two-stage binaural speech enhancement with Wiener filter based on equalization-cancellation model”, in Proc. WASPPA2009 -- IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 133 - 136, New Paltz, New York, Oct. 2009.
J. Li, S. Sakamoto, S. Hongo, M. Akagi and Y. Suzuki, "A speech enhancement approach for binaural hearing aids," In Proc. the 22nd Signal Processing Symposium, pp. 263-268, Sendai, Nov. 2007.
J. Li, S. Sakamoto, S. Hongo and Y. Suzuki, "A new speech enhancement method for two-input two-output hearing aids," Journal of Acoustical Society of America, vol. 120, no. 5, 3aEA, pp. 3157, ( The 4th Joint Meeting of Acoustical Society of America and Acoustical Society of Japan ), Hawaii , USA , Nov. 2006.
M. Doerbecker and S. Ernst, "Combination of two-channel spectral subtraction and adaptive Wiener post-filtering for noise reduction and dereverberation," in Proc.EUSIPCO1996, pp. 995-998, 1996.
H. Nakashima, Y. Chisaki, T. Usagawa and M. Ebata, "Frequency domain binaural model based on interaural phase and level differences," Acoustical Science and Technolgoy, vol. 24, no. 4, pp. 172-178, 2003.
N. Roman, S. Srinivasan and D.L. Wang, "Binaural segergation in multisource reverberant environments," Journal of the Acoustical Society of America , vol. 120, pp. 4040-4051, 2006.
N. Roman, D.L. Wang and G.J. Brown, "Speech segregation based on sound localization," Journal of the Acoustical Society of America, vol. 114, pp. 2236-2252, 2003.
Speech Enhancement (Signal and Communication Techonology), J. Benesty, S. Makino and J. Chen (Eds.), 2005.
http://sound.media.mit.edu/KEMAR.html