Detail

Papers
Conference articles
Presentations

Papers

Masashi Unoki, Toshio Irino, Brian Glasberg, Brian C. J. Moore, and Roy D. Patterson,
"Comparison of the roex and gammachirp filters as representations of theauditory filter,"
J. Acoust. Soc. Am., 120(3), 1474-1492, 2006.

Abstract

Although the rounded-exponential (roex) filter has been successfully used to represent the magnitude response of the auditory filter, recent studies with the roex(p,w,t) filter reveal two serious problems: the fits to notched-noise masking data are somewhat unstable unless the filter is reduced to a physically unrealizable form, and there is no time-domain version of the roex(p,w,t) filter to support modeling of the perception of complex sounds. This paper describes a compressive, gammachirp (cGC) filter with the same architecture as the roex(p,w,t) which could be implemented in the time domain. The gain and asymmetry of this parallel cGC filter are shown to be comparable to those of the roex(p,w,t) filter, but the fits to masking data are still somewhat unstable. The roex(p,w,t) and parallel cGC filters were also compared with the cascade cGC filter [Patterson et al., J. Acoust. Soc. Am. 114, 1529-1542 (2003)], which was found to provide an equivalent fit with 25% fewer coefficients. Moreover, the fits were stable. The advantage of the cascade cGC filter appears to derive from its parsimonious representation of the high-frequency side of the filter. We conclude that cGC filters offer better prospects than roex filters for the representation of the auditory filter.

Keywords

filter architecture, compressive gammachirp, parallel roex filter, simultaneous masking

Title

Roy D. Patterson, Masashi Unoki, and Toshio Irino,
"Extending the domain of center frequencies for the compressive gammachirp auditory filter,"
J. Acoust. Soc. Am., vol. 114, no. 3, pp. 1529-1542, Sept. 2003.

Abstract

The gammatone filter was imported from auditory physiology to provide a time-domain version of the roex auditory filter and enable the development of a realistic auditory filterbank for models of auditory perception [Patterson et al., J. Acoust. Soc. Am. 98, 1890-1894 (1995)]. The gammachirp auditory filter was developed to extend the domain of the gammatone auditory filter and simulate the changes in filter shape that occur with changes in stimulus level. Initially, the gammachirp filter was limited to center frequencies in the 2.0-kHz region where there were sufficient `notched-noise' masking data to define its parameters accurately. Recently, however, the range of the masking data has been extended in two massive studies. This paper reports how a compressive version of the gammachirp auditory filter was fitted to these new data sets to define the filter parameters over the extended frequency range. The results show that the shape of the filter can be specified for the entire domain of the data using just six constants (center frequencies from 0.25 to 6.0 kHz and levels from 30 to 80 dB SPL). The compressive, gammachirp auditory filter also has the advantage of being consistent with physiological studies of cochlear filtering insofar as the compression of the filter is mainly limited to the passband and the form of the chirp in the impulse response is largely independent of level.

Title

Masashi Unoki, Toshio Irino, and Roy D. Patterson,
"Improvement of an IIR asymmetric compensation gammachirp filter,"
Acoust. Sci. &Tech., vol. 22, no. 6, pp. 426-430, 2001.

Abstract

An IIR implementation of the gammachirp filter has been proposed to simulate basilar membrane motion efficiently (Irino and Unoki, 1999). A reasonable filter response was provided by a combination of a gammatone filter and an IIR asymmetric compensation (AC) filter. It was noted, probably however, that the rms error was high when the absolute values of the parameters are large, because the coefficients of the IIR-AC filter were selected heuristically. In this report, we show that this is due to the sign inversion of the phase of poles and zeros in the conventional model. We propose a new definition of the IIR-AC filter and we describe a method of systematic determining the optimum coefficients and number of cascade for the second-order filter. This results in a reduction of the error to about 1/3 that produced by the conventional model.

Keywords

Auditory filter, Gammatone, Gammachirp, IIR asymmetric compensation filter, Filter design

Title

Toshio IRINO and Masashi UNOKI
"An Analysis/Synhesis Auditory Filterbank Based on an IIR Implementation of the Gammachirp,"
The Journal of the Acoustical Society of Japan(E), vol. 20, no. 6, pp. 397--406, Nov. 1999

Abstract

This paper proposes a new auditory filterbank that enables signal resynthesis from dynamic representations produced by a level-dependent auditory filterbank. The filterbank is based on a new IIR implementation of the gammachirp, which has been shown to be an excellent candidate for asymmetric, level-dependent auditory filters. Initially, the gammachirp filter is shown to be decomposed into a combination of a gammatone filter and an asymmetric function. The asymmetric function is excellently simulated with a minimum-phase IIR filter, named the ``asymmetric compensation filter''. Then, two filterbank structures are presented each based on the combination of a gammatone filterbank and a bank of asymmetric compensation filters controlled by a signal level estimation mechanism. The inverse filter of the asymmetric compensation filter is always stable because the minimum-phase condition is satisfied. When a bank of inverse filter is utilized after the gammachirp analysis filterbank and the idea of wavelet transform is applied, it is possible to resynthesize has never been accomplished by conventional active auditory filterbanks. The proposed analysis/synthesis gammachirp filterbank is expected to be useful in various applications where human auditory filtering has to be modeled.

Keywords

Auditory filterbank, Level-dependent asymmetric spectrum, Analysis/synthesis system, Wavelet, Gammatone

Title

Masashi UNOKI and Masato AKAGI
"A Method of Signal Extraction from Noise-added signal,"
Electronics and Communications in Japan (IEICE Trans. A), Part 3, Vol. 80, No.11, 1997

Abstract

This paper proposes a new method of signal extraction from dirty signals with noise. The method is constructed by modeling some constraints of Auditory Scene Analysis (ASA) proposed by Bregman computationally. It segregates the signal from noise by using the amplitude envelope and the phase deviation of the noise-added signal passed through a wavelet filterbank. To evaluate the method, segregation examination of a pure-tone from amplitude modulated narrow-band noise whose center frequency is the same as the pure-tone frequency, is performed. The result indicates that the model can extract the pure-tone, whose SN ratio in creases about 20 dB.

Keywords

auditory scene analysis, two acoustic sources segregation, co-modula tion masking release(CMR), gammatone filter, wavelet filterbank.

Conference articles

Title

Masashi Unoki, Masaaki Kubo, and Masato Akagi
"A model for selective segregation of a target instrument sound from the mixed sound of various instruments"
Proc. ICMC2003, pp. 295-298, Singapore, October 2003.

Abstract

This paper proposes a selective sound segregation model for separating target musical instrument sound from the mixed sound of various musical instruments. The model consists of two blocks: a model of segregating two acoustic sources based on auditory scene analysis as bottom-up processing, and a selective processing based on knowledge sources as top-down processing. Two simulations were carried out to evaluate the proposed model. Results showed that the model could selectively segregate not only the target instrument sound, but also the target performance sound, from the mixed sound of various instruments. This model, therefore, can also be adapted to computationally model the mechanisms of a human's selective hearing system.

Title

Masashi Unoki, Masakazu Furukawa, Keigo Sakata, and Masato Akagi,
"A speech dereverberation method based on the MTF concept,"
Proc. of EuroSpeech2003, Sept. 2003 (accepted).

Abstract

This paper proposes a speech dereverberation method based on the MTF concept. This method can be used without measuring the impulse response of room acoustics. In the model, the power envelopes and carriers are decomposed from a reverberant speech signal using an $N$-channel filterbank and then are dereverberated in each respective channel. In the envelope dereverberation process, a power envelope inverse filtering method is used to dereverberate the envelopes. In the carrier regeneration process, a carrier generation method based on voiced/unvoiced speech from the estimated fundamental frequency (F0) is used. In this paper, we assume that F0 has been estimated accurately. We have carried out 15,000 simulations of dereverberation for reverberant speech signals to evaluate the proposed model. We found that the proposed model can accurately dereverberate not only the power envelopes but also the speech signal from the reverberant speech using regenerated carriers.

Title

Masashi Unoki, Masakazu Furukawa, Keigo Sakata, and Masato Akagi,
"A Method based on the MTF concept for dereverberating the power envelope from the reverberant signal,"
Proc. of ICASSP2003, vol. I, pp. 840-843, April 2003.

Abstract

This paper proposes a method for dereverberating the power envelope from the reverberant signal. This method is based on the modulation transfer function (MTF) and does not require that the impulse response of an environment be measured. It improves upon the basic model proposed by Hirobayashi et al. regarding the following problems: (i) how to precisely extract the power envelope from the observed signal; (ii) how to determine the parameters of the impulse response of the room; and (iii) a lack of consideration as to whether the MTF concept can be applied to a more realistic signal. We have shown that the proposed method can accurately dereverberate the power envelope from the reverberant signal

Title

Masashi Unoki, Masakazu Furukawa, and Masato Akagi,
"A method for recovering the power envelope from the reverberant speech based on MTF,"
Proc. of Forum Acusticum Sevilla 2002, SPA-Gen-002, p. S129, Sevilla, Spain, Sept. 2002.

Abstract

This paper proposes a method for dereververating the power envelope from reverberant speech. This method is based on the concept of the modulation transfer function (MTF) and does not require that the impulse response of an environment be measured. It improves upon the basic model proposed by Hirobayashi et al. regarding the following problems: (i) how to precisely extract the power envelope from the observed signal; (ii) how to determine the parameters of the impulse response of the room; and (iii) application of the MTF to speech and the anti-co-modulation characteristic of the speech envelope. We, then, propose the extended method on the filterbank for applications, based on the three improvements. We have carried out $15,000$ simulations of dereverberation for reverberant speech signals. As results, it was shown that the proposed model can accurately dereverberate the power envelope from reverberant speech.

Title

Takeshi Saitou, Masashi Unoki, and Masato Akagi,
"Extraction of F0 dynamic characteristics and development of F0 control model in singing voice,"
Proc. of ICAD2002, pp. 275-278, Kyoto, Japan, July 2002.

Abstract

Fundamental frequency (F0) control models, which can cope with F0 dynamic characteristics related to singing-voice perception, are required to construct natural singing-voice synthesis systems. This paper discusses the importance of F0 dynamic characteristics in singing voices and demonstrates how much it influence on singing voice perception through psychoacoustic experiments. This paper, then, proposes an F0 control model that can generate F0 fluctuations in singing voices, and a singing-voice synthesis method. The results show that F0 contour including fluctuations: Overshoot, Vibrato, Preparation, and Fine-fluctuation, affects singing voice perception, and the proposed synthesis method can generate natural singing voices by controlling these F0 fluctuations.

Title

Yuichi Ishimoto, Masashi Unoki, and Masato Akagi,
"A fundamental frequency estimation method for noisy speech based on periodicity and harmonicity,"
Proc. of ICASSP2001, SPEECH-SF3.1, USA, May 2001.

Abstract

This paper proposes a robust and accurate F0 estimation method for noisy speech. This method uses two different principles: (1) an F0 estimation based on periodicity and harmonicity of instantaneous amplitude for a robust estimation in noisy environments, and (2) TEMPO2 proposed by Kawahara et al. as an accurate estimation method. The proposed method also uses a comb filter with controllable pass-bands to combine the two estimation methods. Simulations were carried out to estimate F0s from real speech in noisy environments and to compare the proposed method with other methods. The results showed that this method can not only estimate F0s for clean speech with similar accuracy as TEMPO2 but also robustly estimate F0s from noisy speech in comparison with the other method such as TEMPO2 and cepstrum method.

Title

Masato Akagi, Mitsunori Mizumachi, Yuichi Ishimoto, and Masashi Unoki,
"Speech Enhancement and Segregation based on Human Auditory Mechanisms,"
In Proc. of 2001 International Conference on Information Society in the 21st Century (IS2000), pp. 246-254, Aizu-Wakamatsu, Japan, Oct. 2000.

Abstract

This paper introduces models of speech enhancement and segregation based on knowledge about human psychoacoustics and auditory physiology. The cancellation model is used for enhancing speech. Special attention is paid to reducing noise by using a spatial filtering technique, and increasing technique. Both techniques adopt concepts of the cancellation model. In addition, some constraints related to the heuristic regularities proposed by Bregman are used to overcome the problem associated with segregating two acoustic sources. Simulated results show that both spatial and frequency filtering are useful in enhancing speech. As a result, these filtering methods can be used effectively at the front-end of automatic speech recognition system, and for speech feature extraction. The sound segregation model can precisely extract a desired signal from a noisy signal even in waveforms.

Keywords

cancellation model, noise reduction, microphone array, F0 extraction, computational auditory scene analysis

Title

Masashi UNOKI and Masato AKAGI,
"Segregation of vowel in background noise using the model of segregating two acoustic sources based on auditory scene analysis,"
Proc. of CASA'99, pp. 51-60, Stockholm, SWEDEN, August 1999.

Abstract

This paper proposes an auditory sound segregation model based on auditory scene analysis. It solves the problem of segregating two acoustic sources by using constraints related to the heuristic regularities proposed by Bregman and by making an improvement to our previously proposed model. The improvement is to reconsider constraints on the continuity of instantaneous phases as well as constraints on the continuity of instantaneous amplitudes and fundamental frequencies in order to segregate the desired signal from a noisy signal precisely even in waveforms. Simulations performed to segregate a real vowel from a noisy vowel and to compare the results of using all or only some constraints showed that our improved model can segregate real speech precisely even in waveforms using all the constrain ts related to the four regularities, and that the absence of some constraints reduces the segregation accuracy.

Title

Masashi UNOKI and Masato AKAGI,
"Segregation of vowel in background noise using the model of segregating two acoustic sources based on auditory scene analysis,"
Proc. of EuroSpeech'99, vol. 6, pp. 2575-2578, Budapest, Hungary, Sept. 1999.

Abstract

This paper proposes an auditory sound segregation model based on auditory scene analysis. It solves the problem of segregating two acoustic sources by using constraints related to the heuristic regularities proposed by Bregman and by making an improvement to our previously proposed model. The improvement is to reconsider constraints on the continuity of instantaneous phases as well as constraints on the continuity of instantaneous amplitudes and fundamental frequencies in order to segregate the desired signal from a noisy signal precisely even in waveforms. Simulations performed to segregate a real vowel from a noisy vowel and to compare the results of using all or only some constraints showed that our improved model can segregate real speech precisely even in waveforms using all the constraints related to the four regularities, and that the absence of some constraints reduces the segregation accuracy.

Title

Masashi UNOKI and Masato AKAGI
"Signal Extraction from Noisy Signal based on Auditory Scene Analysis,"
ICSLP'98, vol. 4, pp. 1515-1518, 30 Nov. - 4 Dec. 1998. Sydney, Australia.

Abstract

This paper proposes a method of extracting the desired signal from a noisy signal. This method solves the problem of segregating two acoustic sources by using constraints related to the four regularities proposed by Bregman and by making two improvements to our previously proposed method. One is to incorporate a method of estimating the fundamental frequency using the Comb filtering on the filterbank. The other is to reconsider the constraints on the separation block, which constrain the instantaneous amplitude, input phase, and fundamental frequency of the desired signal. Simulations performed to segregate a vowel from a noisy vowel and to compare the results of using all or only some constraints showed that our improved method can segregate real speech precisely using all the constraints related to the four regularities and that the absence some constraints reduces the accuracy.

Title

Masashi UNOKI and Masato AKAGI,
"A Computational Model of Co-modulation Masking Relaese,"
In Proc. of NATO ASI on Computational Hearing, pp. 129-134, Il Ciocco, Itary, 1-12, July. 1998.

Abstract

This paper proposes a computational model of co-modulation masking release (CMR). It consists of two models, our auditory segregation model (model A) and the power spectrum model of masking (model B), and a selection process that selects one of their results. Model A extracts a sinusoidal signal using the outputs of multiple auditory filters and model B extracts a sinusoidal signal using the output of a single auditory filter. The selection process selects the sinusoidal signal with the lowest signal thres hold from the two extracted signals. For both models, simulations similar to Hall et al.'s demonstrations were carried out. Simulation stimuli consisted of two types of noise masker, bandpassed random noise and AM bandpassed random noise. As a result, the signal threshold of the pure tone extracted using the proposed model shows the similar properties to Hall et al.'s demonstrations. The maximum amount of CMR in the proposed model is about 8 dB.

Title

Masashi UNOKI and Masato AKAGI,
"A Method of Signal Extraction from Noisy Signal,"
In Proc. of EuroSpeech'97, vol. 5, pp.2587-2590, RODES, GREECE, Sept. 1997

Abstract

Title

Masashi UNOKI and Masato AKAGI,
"A Method of Signal Extraction from Noisy Signal based on Auditory Scene Analysis,"
Working Notes of the IJCAI-97 Workshop on CASA, pp. 93-102, August 1997.

Abstract

This paper presents a method of extracting the desired signal from a noise-added signal as a model of acoustic source segregation. Using physical constraints related to the four regularities proposed by Bregman, the proposed method can solve the problem of segregating two acoustic sources. These physical constraints correspond to the regularities, which we have translated from qualitative conditions into quantitative conditions. Three simulations were carried out using the following signals: (a) noise-added AM complex tone, (b) mixed AM complex tones, and (c) noisy synthetic vowel. The performance of the proposed method has been evaluated using two measures: precision, that is, likely SNR, and spectrum distortion (SD). As results using the signals (a) and (b), the proposed method can extract the desired AM complex tone from noise-added AM complex tone or mixed AM complex tones, in which signal and noise exist in the same frequency region. In particular, the average of the reduced SD is about $20$ dB. Moreover, as the result using the signal (c), the proposed method can also extract the speech signal from noisy speech.

Title

Work shop

L.P. O'Mard, R. Meddis, M. Unoki, and R. D. Patterson,
"A DSAM application for evaluating nonlinear filterbank used to simulate basilar membrane motion,"
Abstract of the 24th Annual Midwinter Research Meeting, Association for Research in Otolaryngology, p. 257, TradeWinds Islands Resort St. Petersburg Beach, Florida, USA Feb. 2001.

Abstract

A menu-driven computer application is presented that automates the evaluation of nonlinear filterbanks used to characterise the response of the basilar membrane (BM) to simple and complex sounds. It is important to show that a BM simulator reproduces the range of complex features observed in experiments and it is useful to have a convenient means of producing the evaluation functions typically used by experimentalists. Accordingly, the filter evaluation application (FEval) calculates the following functions: tuning curves, input/output functions, filter shapes, phase/intensity and phase/frequency functions, two-tone suppression ratios, two-tone responses, impulse responses and distortion products. The results are output to files in formats that are compatible with post-processing packages such as Excel and Matlab. Using DSAM 'simulation scripts' any model produced using process modules (models or functions) already available in DSAM can be tested. In the current case, FEval has been used to compared the outputs of three nonlinear filterbanks the original model of Carney (1993), the Dual Resonance Non-linear filter of Meddis et al. (submitted to JASA) and the Gammachirp filters of Irino and Patterson (1997). FEval is the newest addition to the family of applications based on the Development System for Auditory Modelling (DSAM). All DSAM applications employ a similar interface and inherit its features. Like other DSAM applications FEval allows a variety of interface options. There is a graphical user interface (GUI) that provides comprehensive access to model and evaluation test parameters. FEval can be started in 'server' mode that can then be controlled from the command line, manually, or by using Matlab or a similar scripting tool. FEval accepts command-line options giving access to all parameters, so it can be employed to produce quite complex analysis runs. FEval also has a command-line only version, for fast processing. FEval is available as an "out of the box" Windows installation (95/98/2000 and NT) for PC's, Linux RPM's, can be installed on UNIX machines using its auto-configuration system, and is easily ported to other systems.

References

Carney, L. H. (1993), "A model for the response of low-frequency auditory-nerve fibers in cat" JASA 93, 401-417.
DSAM:The Development System for Auditory Modelling
Meddis R., O'Mard L. P. and Lopez-Poveda A. E, "A computational algorithm for computing non-linear auditory frequency selectivity", in press. (what journal??? Xxx)

Title

Work shop

Toshio Irino, Masashi Unoki, and Roy D. Patterson,
"A physiologically motivated gammachirp auditory filterbank,"
British Society of Audiology Short Paper Meeting on Experimental Studies of Hearing and Deafness, pp. 35-36, Keele University, Staffordshire, U.K. Sept. 2000.

Abstract

The gammachirp auditory filter was introduced provide an asymmetric, level-dependent version of the gammatone auditory filter (Irino and Patterson, 1997). In this 'analytic' gammachirp filter, the level-dependency was in the chirp parameter. Recently, Carney et al. (1999) reported that, although there is a chirp in the impulse response of the cat's cochlear filter, the form of the chirp does not vary with level. This led Irino and Patterson (1999, 2000) to develop a more physiological version of the gammachirp filter with the following magnitude response. The first term is a fixed gammachirp filter which represents the passive basilar membrane response; the second term is a highpass, asymmetric function (HP-AF) which represents the active component in the cochlea and produces compression. In the computational version, the HP-AF is simulated by an IIR asymmetric compensation (AC) filter (Irino and Unoki, 1999), which was initially developed to reduce the computational load of the gammachirp filterbank. Irino and Patterson (1999, 2000) have demonstrated the physiological gammachirp fit both the physiological revcor data of Carney et al. (1999) and the physiological masking data of Rosen and Baker (1994). A filterbank structure for the physiological gammachirp is shown in Fig. 1 (Irino and Unoki, 1999). It is a cascade of three filterbanks: a gammatone filterbank followed by a lowpass AC filterbank, and then a highpass AC filterbank. The gammatone and lowpass-AC filterbanks together produce the fixed gammachirp filterbank which corresponds to the passive basilar membrane whose motion is observed post-mortem or at high sound pressure levels (Recio et al., 1998). The parameter controller estimates signal level to control the highpass AC filterbank which corresponds the active component in the cochlea.

References

Carney, L. H., Megean, J.M. and Shekhter, I. (1999), J. Acoust. Soc. Am., 105, 2384-2391.
Irino, T. and Patterson, R.D. (1997), J. Acoust. Soc. Am., 101,412-419.
Irino, T. and Patterson, R.D. (1999), Symposium on recent developments in auditory mechanics, Sendai, Japan.
Irino, T. and Patterson, R.D. (2000), XIIth International Symposium on Hearing, Mierlo, The Netherlands.
Irino, T. and Unoki, M. (1999), J. Acoust. Soc. Jpn, (E), 20, 397-406.
Recio, A.R., Rich, N.C., Narayan, S.S. and Ruggero, M.A. (1998), J. Acoust. Soc. Am., 103, 1972-1989.

Presentations

Title

Masashi UNOKI and Masato AKAGI,
"Vowel segregation in background noise using the model of segregating two acoustic sources,"
Proc. of AI Challenge, pp. 7-14, AOYAMA Gakuin Univ., 3Nov. 1999.

Abstract

This paper proposes an improved sound segregation model based on auditory scene analysis in order to overcome three disadvantages in our previously proposed model. The improved model solves the problem of segregating two acoustic sources by using constraints related to the heuristic regularities proposed by Bregman. In the improvements, we (1) reconsider the estimation of unknown parameters using Kalman filtering, (2) incorporate a constraint of channel envelopes with periodicity of the fundam ental frequency into the grouping block, and (3) consider a constraint of smoothness of instantaneous amplitudes on channels. Simulations are performed to segregate a real vowel from a noisy vowel and to compare the results of using all or only some constraints. The proposed model can improve our previous model and precisely segregate real speech even in waveforms using all of the constraints related to Bregman's four regularities.