業績詳細

論文
国際会議
口頭発表

論文

タイトル

Masashi Unoki, Toshio Irino, Brian Glasberg, Brian C. J. Moore, and Roy D. Patterson,
"Comparison of the roex and gammachirp filters as representations of theauditory filter,"
J. Acoust. Soc. Am., 120(3), 1474-1492, 2006.

アブストラクト

Although the rounded-exponential (roex) filter has been successfully used to represent the magnitude response of the auditory filter, recent studies with the roex(p,w,t) filter reveal two serious problems: the fits to notched-noise masking data are somewhat unstable unless the filter is reduced to a physically unrealizable form, and there is no time-domain version of the roex(p,w,t) filter to support modeling of the perception of complex sounds. This paper describes a compressive, gammachirp (cGC) filter with the same architecture as the roex(p,w,t) which could be implemented in the time domain. The gain and asymmetry of this parallel cGC filter are shown to be comparable to those of the roex(p,w,t) filter, but the fits to masking data are still somewhat unstable. The roex(p,w,t) and parallel cGC filters were also compared with the cascade cGC filter [Patterson et al., J. Acoust. Soc. Am. 114, 1529-1542 (2003)], which was found to provide an equivalent fit with 25% fewer coefficients. Moreover, the fits were stable. The advantage of the cascade cGC filter appears to derive from its parsimonious representation of the high-frequency side of the filter. We conclude that cGC filters offer better prospects than roex filters for the representation of the auditory filter.

Keywords

filter architecture, compressive gammachirp, parallel roex filter, simultaneous masking

タイトル

Roy D. Patterson, Masashi Unoki, and Toshio Irino,
"Extending the domain of center frequencies for the compressive gammachirp auditory filter,"
J. Acoust. Soc. Am., vol. 114, no. 3, pp. 1529-1542, Sept. 2003.

アブストラクト

The gammatone filter was imported from auditory physiology to provide a time-domain version of the roex auditory filter and enable the development of a realistic auditory filterbank for models of auditory perception [Patterson et al., J. Acoust. Soc. Am. 98, 1890-1894 (1995)]. The gammachirp auditory filter was developed to extend the domain of the gammatone auditory filter and simulate the changes in filter shape that occur with changes in stimulus level. Initially, the gammachirp filter was limited to center frequencies in the 2.0-kHz region where there were sufficient ‘notched-noise’ masking data to define its parameters accurately. Recently, however, the range of the masking data has been extended in two massive studies. This paper reports how a compressive version of the gammachirp auditory filter was fitted to these new data sets to define the filter parameters over the extended frequency range. The results show that the shape of the filter can be specified for the entire domain of the data using just six constants (center frequencies from 0.25 to 6.0 kHz and levels from 30 to 80 dB SPL). The compressive, gammachirp auditory filter also has the advantage of being consistent with physiological studies of cochlear filtering insofar as the compression of the filter is mainly limited to the passband and the form of the chirp in the impulse response is largely independent of level.

タイトル

Masashi Unoki, Toshio Irino, and Roy D. Patterson,
"Improvement of an IIR asymmetric compensation gammachirp filter,"
Acoust. Sci. &Tech., vol. 22, no. 6, pp. 426-430, 2001.

アブストラクト

An IIR implementation of the gammachirp filter has been proposed to simulate basilar membrane motion efficiently (Irino and Unoki, 1999). A reasonable filter response was provided by a combination of a gammatone filter and an IIR asymmetric compensation (AC) filter. It was noted, probably however, that the rms error was high when the absolute values of the parameters are large, because the coefficients of the IIR-AC filter were selected heuristically. In this report, we show that this is due to the sign inversion of the phase of poles and zeros in the conventional model. We propose a new definition of the IIR-AC filter and we describe a method of systematic determining the optimum coefficients and number of cascade for the second-order filter. This results in a reduction of the error to about 1/3 that produced by the conventional model.

Keywords

Auditory filter, Gammatone, Gammachirp, IIR asymmetric compensation filter, Filter design

タイトル

Toshio IRINO and Masashi UNOKI
"An Analysis/Synhesis Auditory Filterbank Based on an IIR Implementation of the Gammachirp,"
The Journal of the Acoustical Society of Japan(E), vol. 20, no. 6, pp. 397--406, Nov. 1999
【日本音響学会佐藤論文賞授賞】

アブストラクト

This paper proposes a new auditory filterbank that enables signal resynthesis from dynamic representations produced by a level-dependent auditory filterbank. The filterbank is based on a new IIR implementation of the gammachirp, which has been shown to be an excellent candidate for asymmetric, level-dependent auditory filters. Initially, the gammachirp filter is shown to be decomposed into a combination of a gammatone filter and an asymmetric function. The asymmetric function is excellently simulated with a minimum-phase IIR filter, named the ``asymmetric compensation filter''. Then, two filterbank structures are presented each based on the combination of a gammatone filterbank and a bank of asymmetric compensation filters controlled by a signal level estimation mechanism. The inverse filter of the asymmetric compensation filter is always stable because the minimum-phase condition is satisfied. When a bank of inverse filter is utilized after the gammachirp analysis filterbank and the idea of wavelet transform is applied, it is possible to resynthesize has never been accomplished by conventional active auditory filterbanks. The proposed analysis/synthesis gammachirp filterbank is expected to be useful in various applications where human auditory filtering has to be modeled.

Keywords

Auditory filterbank, Level-dependent asymmetric spectrum, Analysis/synthesis system, Wavelet, Gammatone

タイトル

Masashi UNOKI and Masato AKAGI
"A Method of Signal Extraction from Noise-added signal,"
Electronics and Communications in Japan (IEICE Trans. A), Part 3, Vol. 80, No.11, 1997

アブストラクト

This paper proposes a new method of signal extraction from dirty signals with noise. The method is constructed by modeling some constraints of Auditory Scene Analysis (ASA) proposed by Bregman computationally. It segregates the signal from noise by using the amplitude envelope and the phase deviation of the noise-added signal passed through a wavelet filterbank. To evaluate the method, segregation examination of a pure-tone from amplitude modulated narrow-band noise whose center frequency is the same as the pure-tone frequency, is performed. The result indicates that the model can extract the pure-tone, whose SN ratio in creases about 20 dB.

Keywords

auditory scene analysis, two acoustic sources segregation, co-modula tion masking release(CMR), gammatone filter, wavelet filterbank.

タイトル

西山清, 鵜木祐史
"最大値を探索する人工ニューラルネットワーク,"
電子情報通信学会論文誌 Vol. J77-D-II No. 7 pp. 1382-1385, 1995.

アブストラクト

最大値探索（検出）問題はパターン認識における最大類似度の探索など幅広い分野で重要なテーマとなっている．本論文では，最大値を探索する新たなニューラルネットワークとして，２種類の入出力特性（非線形の応答関数）をニューロンからなるニューラルネットワークを提案する．このネットワークはn個のデータに対して約log2(n)+2回のネットワークの更新によって並列・分散的に最大値を求めることができる．また，工夫すれば同じネットワークによって最大値からある範囲のデータ，あるいは最小値の探索も可能である．

国際会議

タイトル

Masashi Unoki, Masaaki Kubo, and Masato Akagi
"A model for selective segregation of a target instrument sound from the mixed sound of various instruments"
Proc. ICMC2003, pp. 295-298, Singapore, October 2003.

アブストラクト

This paper proposes a selective sound segregation model for separating target musical instrument sound from the mixed sound of various musical instruments. The model consists of two blocks: a model of segregating two acoustic sources based on auditory scene analysis as bottom-up processing, and a selective processing based on knowledge sources as top-down processing. Two simulations were carried out to evaluate the proposed model. Results showed that the model could selectively segregate not only the target instrument sound, but also the target performance sound, from the mixed sound of various instruments. This model, therefore,　can also be adapted to computationally model the mechanisms of a human's selective hearing system.

タイトル

Masashi Unoki, Masakazu Furukawa, Keigo Sakata, and Masato Akagi,
"A speech dereverberation method based on the MTF concept,"
Proc. of EuroSpeech2003,　Sept. 2003 (accepted).

アブストラクト

This paper proposes a speech dereverberation method based on the MTF concept. This method can be used without measuring the impulse response of room acoustics. In the model, the power envelopes and carriers are decomposed from a reverberant speech signal using an $N$-channel filterbank and then are dereverberated in each respective channel. In the envelope dereverberation process, a power envelope inverse filtering method is used to dereverberate the envelopes. In the carrier regeneration process, a carrier generation method based on voiced/unvoiced speech from the estimated fundamental frequency (F0) is used. In this paper, we assume that F0 has been estimated accurately. We have carried out 15,000 simulations of dereverberation for reverberant speech signals to evaluate the proposed model. We found that the proposed model can accurately dereverberate not only the power envelopes but also the speech signal from the reverberant speech using regenerated carriers.

タイトル

Masashi Unoki, Masakazu Furukawa, Keigo Sakata, and Masato Akagi,
"A Method based on the MTF concept for dereverberating the power envelope from the reverberant signal,"
Proc. of ICASSP2003, vol. I, pp. 840-843, April 2003.

アブストラクト

This paper proposes a method for dereverberating the power envelope from the reverberant signal. This method is based on the modulation transfer function (MTF) and does not require that the impulse response of an environment be measured. It improves upon the basic model proposed by Hirobayashi et al. regarding the following problems: (i) how to precisely extract the power envelope from the observed signal; (ii) how to determine the parameters of the impulse response of the room; and (iii) a lack of consideration as to whether the MTF concept can be applied to a more realistic signal. We have shown that the proposed method can accurately dereverberate the power envelope from the reverberant signal

タイトル

Masashi Unoki, Masakazu Furukawa, and Masato Akagi,
"A method for recovering the power envelope from the reverberant speech based on MTF,"
Proc. of Forum Acusticum Sevilla 2002, SPA-Gen-002, p. S129, Sevilla, Spain, Sept. 2002.

アブストラクト

This paper proposes a method for dereververating the power envelope from reverberant speech. This method is based on the concept of the modulation transfer function (MTF) and does not require that the impulse response of an environment be measured. It improves upon the basic model proposed by Hirobayashi et al. regarding the following problems: (i) how to precisely extract the power envelope from the observed signal; (ii) how to determine the parameters of the impulse response of the room; and (iii) application of the MTF to speech and the anti-co-modulation characteristic of the speech envelope. We, then, propose the extended method on the filterbank for applications, based on the three improvements. We have carried out $15,000$ simulations of dereverberation for reverberant speech signals. As results, it was shown that the proposed model can accurately dereverberate the power envelope from reverberant speech.

タイトル

Takeshi Saitou, Masashi Unoki, and Masato Akagi,
"Extraction of F0 dynamic characteristics and development of F0 control model in singing voice,"
Proc. of ICAD2002, pp. 275-278, Kyoto, Japan, July 2002.

アブストラクト

Fundamental frequency (F0) control models, which can cope with F0 dynamic characteristics related to singing-voice perception, are required to construct natural singing-voice synthesis systems. This paper discusses the importance of F0 dynamic characteristics in singing voices and demonstrates how much it influence on singing voice perception through psychoacoustic experiments. This paper, then, proposes an F0 control model that can generate F0 fluctuations in singing voices, and a singing-voice synthesis method. The results show that F0 contour including fluctuations: Overshoot, Vibrato, Preparation, and Fine-fluctuation, affects singing voice perception, and the proposed synthesis method can generate natural singing voices by controlling these F0 fluctuations.

タイトル

Yuichi Ishimoto, Masashi Unoki, and Masato Akagi,
"A fundamental frequency estimation method for noisy speech based on periodicity and harmonicity,"
Proc. of ICASSP2001, SPEECH-SF3.1, USA, May 2001.

アブストラクト

This paper proposes a robust and accurate F0 estimation method for noisy speech. This method uses two different principles: (1) an F0 estimation based on periodicity and harmonicity of instantaneous amplitude for a robust estimation in noisy environments, and (2) TEMPO2 proposed by Kawahara et al. as an accurate estimation method. The proposed method also uses a comb filter with controllable pass-bands to combine the two estimation methods. Simulations were carried out to estimate F0s from real speech in noisy environments and to compare the proposed method with other methods. The results showed that this method can not only estimate F0s for clean speech with similar accuracy as TEMPO2 but also robustly estimate F0s from noisy speech in comparison with the other method such as TEMPO2 and cepstrum method.

タイトル

Masato Akagi, Mitsunori Mizumachi, Yuichi Ishimoto, and Masashi Unoki,
"Speech Enhancement and Segregation based on Human Auditory Mechanisms,"
In Proc. of 2001 International Conference on Information Society in the 21st Century (IS2000), pp. 246-254, Aizu-Wakamatsu, Japan, Oct. 2000.

アブストラクト

This paper introduces models of speech enhancement and segregation based on knowledge about human psychoacoustics and auditory physiology. The cancellation model is used for enhancing speech. Special attention is paid to reducing noise by using a spatial filtering technique, and increasing technique. Both techniques adopt concepts of the cancellation model. In addition, some constraints related to the heuristic regularities proposed by Bregman are used to overcome the problem associated with segregating two acoustic sources. Simulated results show that both spatial and frequency filtering are useful in enhancing speech. As a result, these filtering methods can be used effectively at the front-end of automatic speech recognition system, and for speech feature extraction. The sound segregation model can precisely extract a desired signal from a noisy signal even in waveforms.

Keywords

cancellation model, noise reduction, microphone array, F0 extraction, computational auditory scene analysis

タイトル

Masashi UNOKI and Masato AKAGI,
"Segregation of vowel in background noise using the model of segregating two acoustic sources based on auditory scene analysis,"
Proc. of CASA'99, pp. 51-60, Stockholm, SWEDEN, August 1999.

アブストラクト

This paper proposes an auditory sound segregation model based on auditory scene analysis. It solves the problem of segregating two acoustic sources by using constraints related to the heuristic regularities proposed by Bregman and by making an improvement to our previously proposed model. The improvement is to reconsider constraints on the continuity of instantaneous phases as well as constraints on the continuity of instantaneous amplitudes and fundamental frequencies in order to segregate the desired signal from a noisy signal precisely even in waveforms. Simulations performed to segregate a real vowel from a noisy vowel and to compare the results of using all or only some constraints showed that our improved model can segregate real speech precisely even in waveforms using all the constrain ts related to the four regularities, and that the absence of some constraints reduces the segregation accuracy.

タイトル

Masashi UNOKI and Masato AKAGI,
"Segregation of vowel in background noise using the model of segregating two acoustic sources based on auditory scene analysis,"
Proc. of EuroSpeech'99, vol. 6, pp. 2575-2578, Budapest, Hungary, Sept. 1999.

アブストラクト

This paper proposes an auditory sound segregation model based on auditory scene analysis. It solves the problem of segregating two acoustic sources by using constraints related to the heuristic regularities proposed by Bregman and by making an improvement to our previously proposed model. The improvement is to reconsider constraints on the continuity of instantaneous phases as well as constraints on the continuity of instantaneous amplitudes and fundamental frequencies in order to segregate the desired signal from a noisy signal precisely even in waveforms. Simulations performed to segregate a real vowel from a noisy vowel and to compare the results of using all or only some constraints showed that our improved model can segregate real speech precisely even in waveforms using all the constraints related to the four regularities, and that the absence of some constraints reduces the segregation accuracy.

タイトル

Masashi UNOKI and Masato AKAGI
"Signal Extraction from Noisy Signal based on Auditory Scene Analysis,"
ICSLP'98, vol. 4, pp. 1515-1518, 30 Nov. - 4 Dec. 1998. Sydney, Australia.

アブストラクト

This paper proposes a method of extracting the desired signal from a noisy signal. This method solves the problem of segregating two acoustic sources by using constraints related to the four regularities proposed by Bregman and by making two improvements to our previously proposed method. One is to incorporate a method of estimating the fundamental frequency using the Comb filtering on the filterbank. The other is to reconsider the constraints on the separation block, which constrain the instantaneous amplitude, input phase, and fundamental frequency of the desired signal. Simulations performed to segregate a vowel from a noisy vowel and to compare the results of using all or only some constraints showed that our improved method can segregate real speech precisely using all the constraints related to the four regularities and that the absence some constraints reduces the accuracy.

タイトル

Masashi UNOKI and Masato AKAGI,
"A Computational Model of Co-modulation Masking Relaese,"
In Proc. of NATO ASI on Computational Hearing, pp. 129-134, Il Ciocco, Itary, 1-12, July. 1998.

アブストラクト

This paper proposes a computational model of co-modulation masking release (CMR). It consists of two models, our auditory segregation model (model A) and the power spectrum model of masking (model B), and a selection process that selects one of their results. Model A extracts a sinusoidal signal using the outputs of multiple auditory filters and model B extracts a sinusoidal signal using the output of a single auditory filter. The selection process selects the sinusoidal signal with the lowest signal thres hold from the two extracted signals. For both models, simulations similar to Hall et al.'s demonstrations were carried out. Simulation stimuli consisted of two types of noise masker, bandpassed random noise and AM bandpassed random noise. As a result, the signal threshold of the pure tone extracted using the proposed model shows the similar properties to Hall et al.'s demonstrations. The maximum amount of CMR in the proposed model is about 8 dB.

タイトル

Masashi UNOKI and Masato AKAGI,
"A Method of Signal Extraction from Noisy Signal,"
In Proc. of EuroSpeech'97, vol. 5, pp.2587-2590, RODES, GREECE, Sept. 1997

アブストラクト

タイトル

Masashi UNOKI and Masato AKAGI,
"A Method of Signal Extraction from Noisy Signal based on Auditory Scene Analysis,"
Working Notes of the IJCAI-97 Workshop on CASA, pp. 93-102, August 1997.

アブストラクト

This paper presents a method of extracting the desired signal from a noise-added signal as a model of acoustic source segregation. Using physical constraints related to the four regularities proposed by Bregman, the proposed method can solve the problem of segregating two acoustic sources. These physical constraints correspond to the regularities, which we have translated from qualitative conditions into quantitative conditions. Three simulations were carried out using the following signals: (a) noise-added AM complex tone, (b) mixed AM complex tones, and (c) noisy synthetic vowel. The performance of the proposed method has been evaluated using two measures: precision, that is, likely SNR, and spectrum distortion (SD). As results using the signals (a) and (b), the proposed method can extract the desired AM complex tone from noise-added AM complex tone or mixed AM complex tones, in which signal and noise exist in the same frequency region. In particular, the average of the reduced SD is about $20$ dB. Moreover, as the result using the signal (c), the proposed method can also extract the speech signal from noisy speech.

タイトル

Work shop

L.P. O'Mard, R. Meddis, M. Unoki, and R. D. Patterson,
"A DSAM application for evaluating nonlinear filterbank used to simulate basilar membrane motion,"
Abstract of the 24th Annual Midwinter Research Meeting, Association for Research in Otolaryngology, p. 257, TradeWinds Islands Resort St. Petersburg Beach, Florida, USA Feb. 2001.

アブストラクト

A menu-driven computer application is presented that automates the evaluation of nonlinear filterbanks used to characterise the response of the basilar membrane (BM) to simple and complex sounds. It is important to show that a BM simulator reproduces the range of complex features observed in experiments and it is useful to have a convenient means of producing the evaluation functions typically used by experimentalists. Accordingly, the filter evaluation application (FEval) calculates the following functions: tuning curves, input/output functions, filter shapes, phase/intensity and phase/frequency functions, two-tone suppression ratios, two-tone responses, impulse responses and distortion products. The results are output to files in formats that are compatible with post-processing packages such as Excel and Matlab. Using DSAM 'simulation scripts' any model produced using process modules (models or functions) already available in DSAM can be tested. In the current case, FEval has been used to compared the outputs of three nonlinear filterbanks the original model of Carney (1993), the Dual Resonance Non-linear filter of Meddis et al. (submitted to JASA) and the Gammachirp filters of Irino and Patterson (1997). FEval is the newest addition to the family of applications based on the Development System for Auditory Modelling (DSAM). All DSAM applications employ a similar interface and inherit its features. Like other DSAM applications FEval allows a variety of interface options. There is a graphical user interface (GUI) that provides comprehensive access to model and evaluation test parameters. FEval can be started in 'server' mode that can then be controlled from the command line, manually, or by using Matlab or a similar scripting tool. FEval accepts command-line options giving access to all parameters, so it can be employed to produce quite complex analysis runs. FEval also has a command-line only version, for fast processing. FEval is available as an "out of the box" Windows installation (95/98/2000 and NT) for PC's, Linux RPM's, can be installed on UNIX machines using its auto-configuration system, and is easily ported to other systems.

References

Carney, L. H. (1993), "A model for the response of low-frequency auditory-nerve fibers in cat" JASA 93, 401-417.
DSAM:The Development System for Auditory Modelling
Meddis R., O'Mard L. P. and Lopez-Poveda A. E, "A computational algorithm for computing non-linear auditory frequency selectivity", in press. (what journal??? Xxx)

タイトル

Work shop

Toshio Irino, Masashi Unoki, and Roy D. Patterson,
"A physiologically motivated gammachirp auditory filterbank,"
British Society of Audiology Short Paper Meeting on Experimental Studies of Hearing and Deafness, pp. 35-36, Keele University, Staffordshire, U.K. Sept. 2000.

アブストラクト

The gammachirp auditory filter was introduced provide an asymmetric, level-dependent version of the gammatone auditory filter (Irino and Patterson, 1997). In this 'analytic' gammachirp filter, the level-dependency was in the chirp parameter. Recently, Carney et al. (1999) reported that, although there is a chirp in the impulse response of the cat's cochlear filter, the form of the chirp does not vary with level. This led Irino and Patterson (1999, 2000) to develop a more physiological version of the gammachirp filter with the following magnitude response. The first term is a fixed gammachirp filter which represents the passive basilar membrane response; the second term is a highpass, asymmetric function (HP-AF) which represents the active component in the cochlea and produces compression. In the computational version, the HP-AF is simulated by an IIR asymmetric compensation (AC) filter (Irino and Unoki, 1999), which was initially developed to reduce the computational load of the gammachirp filterbank. Irino and Patterson (1999, 2000) have demonstrated the physiological gammachirp fit both the physiological revcor data of Carney et al. (1999) and the physiological masking data of Rosen and Baker (1994). A filterbank structure for the physiological gammachirp is shown in Fig. 1 (Irino and Unoki, 1999). It is a cascade of three filterbanks: a gammatone filterbank followed by a lowpass AC filterbank, and then a highpass AC filterbank. The gammatone and lowpass-AC filterbanks together produce the fixed gammachirp filterbank which corresponds to the passive basilar membrane whose motion is observed post-mortem or at high sound pressure levels (Recio et al., 1998). The parameter controller estimates signal level to control the highpass AC filterbank which corresponds the active component in the cochlea.

References

Carney, L. H., Megean, J.M. and Shekhter, I. (1999), J. Acoust. Soc. Am., 105, 2384-2391.
Irino, T. and Patterson, R.D. (1997), J. Acoust. Soc. Am., 101,412-419.
Irino, T. and Patterson, R.D. (1999), Symposium on recent developments in auditory mechanics, Sendai, Japan.
Irino, T. and Patterson, R.D. (2000), XIIth International Symposium on Hearing, Mierlo, The Netherlands.
Irino, T. and Unoki, M. (1999), J. Acoust. Soc. Jpn, (E), 20, 397-406.
Recio, A.R., Rich, N.C., Narayan, S.S. and Ruggero, M.A. (1998), J. Acoust. Soc. Am., 103, 1972-1989.

口頭発表

タイトル

窪正晃, 鵜木祐史, 赤木正人,
"楽器音の音響的特徴を知識として用いた目的音の選択的分離抽出法,"
日本音響学会聴覚研究会, Vol. 32, No. 10, H-2002-90, Dec. 2002

Keywords

聴覚フィルタ，ガンマトーン，ガンマチャープ， IIR形非対称性補償フィルタ

タイトル

鵜木祐史, 入野俊夫,
"様々な周波数のノッチ雑音データへの圧縮型ガンマチャープの適合,"
日本音響学会聴覚研究会, Vol. 32, No. 1, H-2002-06, 岩手県立大学, Jan. 2002

アブストラクト

ガンマチャープ聴覚フィルタは，音圧レベル依存のフィルタ形状変化の心理物理的知見ばかりでなく，非線形圧縮特性や瞬時周波数移行の生理学的知見も説明できるように発展してきた．今まで，このフィルタのパラメータ推定はノッチ雑音マスキングデータを用いて十分なレベル範囲で行なってきたが，2 kHz以外のプローブ周波数に対してはまだ検討を行なってこなかった．本報告では，二つの研究機関で測定された，ほぼ全可聴域をカバーするノッチ雑音マスキングデータに対して，圧縮型ガンマチャープを適合した結果を報告する．求められたフィルタ群は，中心周波数に対してなだらかなパラメータ変化を持ち，利得のレベル依存性が通過域に限定されている点で蝸牛の生理学的知見と整合性がある．

Keywords

ノッチ雑音マスキングデータ，圧縮型ガンマチャープ聴覚フィルタ，非対称性関数，圧縮特性，帯域幅

タイトル

鵜木祐史, 入野俊夫,
"非対称性補償形ガンマチャープフィルタの近似精度の改善,"
日本音響学会聴覚研究会, 北海道大学, H-2000-42, June 2000.

アブストラクト

先の報告では，インパルス応答で定義されるガンマチャープフィルタをIIR形の非対称性補償フィルタとガンマトーンフィルタの組合せで近似する手法を提案した．ここでは，近似フィルタの係数が発見的に設定されていたため，その近似精度を更に高められる可能性が残されていた．また，適合させたパラメータ空間が比較的狭くて疎であることから，これを広くて密にした場合の近似精度を同程度に抑えられるかを保証できなかった．本報告では，非対称性補償フィルタにおける極／零点の位相の正負反転に対する制約を加味し，係数のパラメータ依存性を再考することで，近似精度を高めることを考えた．その結果，先に提案した係数の組合せによる場合の約半分の実効誤差で近似できることを示した．

Keywords

聴覚フィルタ，ガンマトーン，ガンマチャープ， IIR形非対称性補償フィルタ

タイトル

石本祐一, 鵜木祐史, 赤木正人,
"周期性と調波性を考慮した雑音環境における基本周波数推定,"
日本音響学会聴覚研究会, H-2000-81, Sept 2000.

アブストラクト

音声情報処理では目的の音声から基本周波数を抽出することが重要である．しかし，周囲に雑音のある環境においては基本周波数を抽出することは，雑音の影響によって目的音声が歪んでしまうために困難である．我々は先に，帯域幅可変くし形フィルタを用いた雑音抑圧と2つの基本周波数推定法を組み合わせた，雑音にロバストで高精度の基本周波数推定法を提案した．この推定法は雑音が付加された単母音や連続母音に対して高精度の基本周波数を推定することができたが，連続音声に対しては推定精度の低下が見られた．本報告では，連続音声における基本周波数推定のロバスト性を高めるために，瞬時振幅の周期性と調波性からそれぞれに対応する基本周波数を推定し，それを統合することで信頼性の高い基本周波数推定を行なう方法を提案する．また，この方法を先の提案法に組み入れることにより，雑音環境において高精度の基本周波数推定を行なう．

Keywords

基本周波数，瞬時振幅，周期性，調波性，Dempsterの結合規則

タイトル

Masashi UNOKI and Masato AKAGI,
"Vowel segregation in background noise using the model of segregating two acoustic sources,"
Proc. of AI Challenge, pp. 7-14, AOYAMA Gakuin Univ., 3Nov. 1999.

アブストラクト

This paper proposes an improved sound segregation model based on auditory scene analysis in order to overcome three disadvantages in our previously proposed model. The improved model solves the problem of segregating two acoustic sources by using constraints related to the heuristic regularities proposed by Bregman. In the improvements, we (1) reconsider the estimation of unknown parameters using Kalman filtering, (2) incorporate a constraint of channel envelopes with periodicity of the fundam ental frequency into the grouping block, and (3) consider a constraint of smoothness of instantaneous amplitudes on channels. Simulations are performed to segregate a real vowel from a noisy vowel and to compare the results of using all or only some constraints. The proposed model can improve our previous model and precisely segregate real speech even in waveforms using all of the constraints related to Bregman's four regularities.

タイトル

鵜木祐史, 赤木正人,
"共変調マスキング解除の計算モデルの提案,"
日本音響学会聴覚研究会, H-98-51, June 1998.

アブストラクト

本論文では，共変調マスキング解除（CMR）の計算モデルを提案する．このモデルは，著者らが提案した二波形分離モデル（モデルA）とマスキングのパワースペクトルモデル（モデルB）の二つのモデル，及び，二つのモデルの処理結果を選択する選択処理部で構成される．モデルAでは聴覚フィルタ群の出力を，モデルBでは単一の聴覚フィルタの出力を利用して，マスクされた信号から純音を分離抽出する．選択処理部では，この二つの分離抽出された純音からマスキングしきい値の低い方の純音を選択する．本モデルに対し、 HallらによるCMRの実験を想定したシミュレーションを行った結果，分離抽出された純音のマスキングしきい値の変化は，Hallらが示したCMRの結果と類似する傾向が示された．また，このときの共変調マスキング解除量は最大約8 dBであった．

タイトル

鵜木祐史, 赤木正人
"聴覚の情景解析に基づいた二波形分離モデルの提案,"
電子情報通信学会技術報告, SP-98-158, March 1999.

アブストラクト

本論文では、聴覚の情景解析に基づいた音源分離のモデル化の試みとして、二波形分離問題を取り上げ、雑音が付加された信号から望みの信号を分離抽出する方法を提案する。ここで議論する二波形分離問題は、観測された混合信号から元の二波形を求める不良設定の逆問題であるため、一意に解くためには制約条件が必要である。本方法では、信号の特徴として分析フィルタ群を通過した混合信号の瞬時振幅と瞬時位相を利用し、Bregmanによって提唱された四つの発見的規則を制約条件として利用することで、望みの信号の瞬時振幅と瞬時位相を一意に求める。計算機シミュレーションの結果、本方法を利用することで、雑音中から実音声（単母音と連続母音）を波形レベルにおいて高い精度で分離抽出できることが示された。また、本方法で利用する制約条件のいくつかを順番に省略した場合の分離精度を比較検討した結果、四つの発見的規則をすべて利用することの有効性が示された。

タイトル

鵜木祐史, 赤木正人,
"聴覚の情景解析に基づいた二波形分離モデルの提案,"
平成11年春季音響学会講演論文, 1-2-6, March 1999.

アブストラクト

本稿では，聴覚の情景解析に基づいた二波形分離モデルを提案する．このモデルは，先に提案したモデルで問題となっていた，分離抽出音の波形レベル復元を可能にしたものである．シミュレーションの結果，本モデルを用いることで，雑音中から実音声を高い精度で分離抽出できた．

タイトル

鵜木祐史, 赤木正人,
"聴覚の情景解析に基づいた雑音下の定常母音の分離抽出"
平成10年秋季音響学会講演論文, 2-8-10, Sept. 1998.

アブストラクト

本稿では，聴覚の情景解析に基づいた雑音中の定常母音の分離抽出法を提案する．この方法は，雑音にロバストな基本周波数の推定方法を，先に著者らによって提案された雑音中の調波複合音の分離抽出法に組み込み，発展させたものである．シミュレーションの結果，本方法を用いることで，雑音中から実母音を高い精度で分離抽出できた．

タイトル

鵜木祐史, 入野俊夫,
"ガンマチャープフィルタとフィルタバンクの効率的な構成,"
日本音響学会聴覚研究会資料, H-97-69, Oct. 1997.

アブストラクト

聴覚フィルタとして提案されてきたが、時変系とするには演算コストの高いFIR フィルタでしか構成できていなかったガンマチャープフィルタの効率を上げることを検討した。ガンマトーンフィルタと少ない係数のIIRフィルタで構成した非対称性補償フィルタを組み合わせることにより、これを精度高く構成できることを示した。また、時変系の聴覚末梢系を模擬するため、ガンマトーンフィルタバンク・非対称性補償フィルタバンク・パラメータ制御回路からなるガンマチャープフィルタバンクの構成法も示した。IIRの非対称性補償フィルタは順フィルタばかりでなく逆フィルタも安定性が保証されることを示し、時変系の聴覚フィルタバンクの分析合成系が線形系の精度で構成できることを示した。これは、従来の聴覚フィルタでは達成できなかった本方式の特徴で、人間の聴覚特性を考慮した周波数分析のみならず、その変形・合成が必要とされるさまざまな信号処理に応用可能である。

タイトル

鵜木祐史, 赤木正人,
"基本周波数の時間変動を考慮した調波複合音の抽出法,"
電子情報通信学会技術報告, SP-97-129, March 1998.

アブストラクト

著者らはこれまでに，Bregmanによって提唱された四つの発見的規則を利用することで， AM調波複合音を分離抽出する二波形分離問題の解法を提案した．しかし，この解法では，基本周波数が一定であり，その値が既知であるという仮定があった．本論文では，この解法に，河原によって提案されたTEMPOによる基本周波数の抽出と抽出された基本周波数の時間変化に制約を設けることで，雑音が付加された調波複合音から望みの調波複合音を分離抽出する方法を提案する．この方法の有効性を示すために，(a)雑音が付加されたAM調波複合音， (b)AM調波複合音同士，(c)雑音が付加された合成母音の三つの信号を用いてシミュレーションを行なった．この結果，本モデルが，SD値で約15 dB程度の雑音除去を可能とし，雑音が付加された調波複合音を分離抽出することができることが示された．

タイトル

入野俊夫, 鵜木祐史,
"ガンマチャープフィルタバンクにおける時変系分析合成聴覚モデル,"
平成10年春季音響学会講演論文, 1-8-2, March 1998.

アブストラクト

従来の時変系の聴覚モデルでは，信号の分析出力から精度良く信号を再合成することは困難であった．このことが，線形予測やSTFTに比べて，聴覚モデルが通信システムの信号処理で使われてこなかった原因の一つだと考えられる．本報告では，非対称性補償型ガンマチャープフィルタバンクによって，合成系も実現できることを示す．

タイトル

鵜木祐史, 入野俊夫,
"ガンマチャープフィルタバンクにおける非対称性の制御方法,"
平成10年春季音響学会講演論文, 1-8-3, March 1998.

アブストラクト

本稿では，ガンマチャープフィルタバンクにおける聴覚フィルタの非対称性パラメータの制御方法を提案する．この方法により，入力信号の音圧に依存したガンマチャープフィルタの非対称性を動的に制御でき，時変系の聴覚末梢系としてガンマチャープフィルタバンクを構築することが可能になった．

タイトル

鵜木祐史, 赤木正人,
"基本周波数の時間変動を考慮した調波複合音の抽出,"
平成10年春季音響学会講演論文, 2-8-16, March 1998.

アブストラクト

本稿では，著者らによって提案された帯域雑音中のAM調波複合音の抽出方法における，基本周波数の推定とその時間変動への対応について再考し，抽出方法の高性能化をはかった．この結果，本モデルは，AM調波複合音だけでなく，基本周波数が時間的に変動するような合成母音についても雑音中から分離抽出可能となった．

タイトル

鵜木祐史, 赤木正人
"共変調マスキング解除の計算機モデルに関する一考察,"
電子情報通信学会技術報告, SP-96-37, July 1996.

アブストラクト

本論文では，音源分離のモデル化の試みの１つとして共変調マスキング解除（CMR）を想定した２波形分離問題を取り上げ，CMRのの工学的モデル化を行なう．まず，２波形分離問題の定式化を行ない，聴覚フィルタ群としてGammatone filterを基底関数としたwavelet分析合成系を設計する．次に，２波形分離に必要な３つの物理量（振幅包絡，出力位相，入力位相）の計算方法を明らかにする．ここで，入力位相はBregmanが提唱した(1)１つの音響事象に生じる変化と(2)漸近的変化に関する発見的規則を物理的制約条件として用いることで求められる． CMRのシミュレーションとして， HallらによるCMRの実験を想定した２波形分離問題を解いたところ，本モデルがCMRの工学的モデルと解釈できる結果が得らた．このとき，共変調マスキング解除量は最大約8 dBであった．

タイトル

鵜木祐史, 赤木正人
"共変調マスキング解除の計算機モデルの高性能化,"
平成９年春季音響学会講演論文, 3-8-2, March 1997.

アブストラクト

著者らによって提案された共変調マスキング解除（CMR）の計算モデルについて再考した．特に，Gammatoneフィルタの群遅延，HallらによるCMRの実験と等価な実験条件を考慮して，高性能化をはかった．この結果，計算モデルによりHallらの実験結果と同様の結果が得られた．

タイトル

鵜木祐史, 赤木正人
"帯域雑音中のAM調波複合音の一抽出法,"
平成９年春季音響学会講演論文, 2-8-7, March 1997.

アブストラクト

本稿では，音源分離のモデル化の試みの一つとして２波形分離問題を取り上げ，帯域雑音が付加されたAM調波複合音から，AM調波複合音を分離・抽出する方法を提案する．この方法により，信号と雑音が同一周波数領域に存在しても，正確に AM調波複合音の分離抽出が可能になった．

タイトル

鵜木祐史, 赤木正人
"帯域雑音中のAM調波複合音の一抽出法,"
電子情報通信学会技術報告, SP-96-123, March 1997.

アブストラクト

本論文では，帯域雑音中のAM調波複合音を分離抽出する方法を提案する．著者らはこれまでに， Bregmanによって提唱された四つの発見的規則のうちの二つ（漸近的変化，一つの音響事象で生じる変化）を利用し，帯域雑音中に純音が混入した二波形分離問題の解法を提案した．本論文で新たに提案する方法は，著者らによって提案された解法に，残りの二つの発見的規則（立上り・立下り，調波関係）を加えて発展させたものである．この方法の有効性を示すために二波形分離のシミュレーションを行った．

Keywords

聴覚の情景解析，波形分離，カルマンフィルタ，スプライン補間

タイトル

鵜木祐史、赤木正人
"雑音が付加された信号波形の抽出法,"
音響学会聴覚研究会資料,H-95-79, Sept. 1995.

アブストラクト

本論文では，音源分離のモデル化の試みとして２波形分離問題を取り上げ，雑音が付加された波形から原信号と雑音を分離，抽出する方法を提案する．この方法は，聴覚の情景解析に基づくものであり，wavelet分析合成系の各フィルタ出力から得られる振幅包絡と出力位相，入力信号間の位相を用いることで２波形分離を可能にする．この３つの物理的手がかりは，Bregmanが示した共変調マスキング解除と漸近的変化に関する発見的規則を物理的制約条件として用いることで導出される．この方法を用いた分離例として，帯域雑音中に純音が混入した２波形分離問題の解法を示す．特に，振幅変調された雑音が混合された場合，分離が容易になり，ランダム帯域雑音の場合，分離が困難になるという，共変調マスキング解除の工学的な説明として解釈できる結果が得られた．

Keywords

共変調マスキング解除(CMR)，２波形分離，Gammatone filter，wavelet分析合成系，聴覚の情景解析

タイトル

鵜木祐史, 赤木正人,
"帯域雑音に埋もれた信号音の一抽出法,"
平成８年春季音響学会講演論文, 3-3-15, March 1996.

アブストラクト

近年，Auditory Scene Analysisに基づく音源分離の研究が盛んに行なわれるようになった．この計算機モデルとして，スペクトログラム中の音響的手がかりを利用したいくつかの分凝の実装例があるが，２つの信号が同じ周波数領域の成分を含むような場合，完全に分離できているとは言い難い．本稿では，同一周波数領域において完全に分離するためには、振幅スペクトル（パワー）の他に位相も考慮しなければならないという立場に立ち、２波形分離問題の解法の１つとして，雑音が付加された波形から信号波形を抽出する方法を提案する．

タイトル

西山清, 鵜木祐史,
"拡張Hopfield連想記憶モデルにおける冗長ニューロンの圧縮アルゴリズムの一般化,"
電子情報通信学会技術報告, NC93-27, March 1994.

アブストラクト

すべての記憶パターンが直交条件を満たすように冗長ニューロンをネットワークに付加することによって，Hopfield連想記憶モデルの記憶容量と想起能力が大幅に向上できることが著者らの１人によって明らかにされている．しかし，このモデルは記憶パターン数の増加に伴い冗長ニューロン数が大幅に増加すると云った問題があった．本論文では，このHopfield連想記憶モデルに付加された冗長ニューロンの圧縮とその可能性を理論的に明らかにする．

タイトル

西山清, 鵜木祐史,
"拡張Hopfield連想記憶モデル（I）,"
電子情報通信学会技術報告, NC93-94, March 1994.

アブストラクト

すべての記憶パターンが直交条件を満たすように冗長ニューロンをネットワークに付加することによって，Hopfield連想記憶モデルの想起能力を向上できることが著者らの１人によって明らかにされている．しかし，このモデルは記憶パターン数の増加と共に，想起能力の急激な低下や冗長ニューロン数の大幅な増加を引き起こした．そこで，本論文では先に提案した冗長ニューロンをもつHopfield連想規則モデルに次のような改善を加えた．(i)冗長部の高次の交わりとしきい値を用いて，冗長ニューロンの総数およびその結合を大幅に軽減した．(ii)入力パターンを用いて，冗長ニューロンの初期状態を効果的に推測し，想起過程におけるエネルギー局面の出発点を記憶パターンを銘記した極小点にできるだけ近付けた．これにより，冗長ニューロン数の増加を極力抑えつつ，従来のHopfield連想規則モデルでは不可能であった数字10文字とアルファベット26文字の記憶および想起を可能にした．

タイトル

西山清, 鵜木祐史,
"すべての記憶パターンが直交するようにHopfield連想記憶モデルに付加された冗長ニューロンの圧縮可能性,"
電子情報通信学会技術報告, NC93-95, July 1993.

アブストラクト

先の論文において，著者らはHopfieldニューラルネットワークに基づく新しい連想記憶モデルを提案した．そのモデルは新たに生成された記憶パターンのすべてが互いに直交条件を満たすように，元のネットワークに付け加えられた冗長ニューロンをもっていた．今回は，そのように付加されたニューロンを圧縮するためのいくつかのアルゴリズムと入力パターンを用いて効果的に冗長ニューロンを初期化する方法が提案されている．