Developments in recent years have caused the auditory system to be considered an active scene analysis system stimulating the study of acoustic source segregation based on auditory scene analysis (ASA) [1,2]. If it becomes possible to solve the problem of acoustic source segregation, not only will it become possible to extract sounds required by the listener while rejecting others, but could find application in a robust speech recognition system [4]. We feel that constructing a computational theory of audition in an analogy to a computational theory of vision proposed by Marr [5] will require time complete; however, we feel that modeling based on ASA suggests a new approach in the construction of a computational theory of audition [6,7,8] since ASA shows a direction for constructing a computational theory.
Bregman reported that, for solving the problem of ASA in understanding an environment through acoustic events, the human auditory system uses four psychoacoustically heuristic regularities related to acoustic events[2,3]:
We stress the need for considering not only the amplitude spectrum but also the phase spectrum, when attempting to extract completely the signal from a noise-added signal in which both exist in the same frequency region [17]; based on this stance, we seek to solve the problem of segregating two acoustic sources -- basic problem of acoustic source segregation using regularities (2) and (4) as proposed by Bregman [18,24]. This paper proposes a method of signal extraction from noise-added signal as a solution for the problem of segregating two acoustic sources. This method uses amplitude and phase spectra calculated by the wavelet transform from noise-added signal; it also shows that if the parameters of the proposed model are set to the human auditory properties, the proposed model can be a computational model of co-modulation masking release (CMR) [19].
The paper is organized as follows: Section 2 illustrates the proposed model and then formulates the problem of segregating two acoustic sources; Section 3 shows the design of the wavelet filterbank and its characteristics; Section 4 shows calculation of the physical parameters and segregation algorithm; Section 5 carries out computer simulations for segregating two acoustic sources to show advantages of the proposed method; Section 6 shows that the proposed model can be a computational model of co-modulation masking release if the model parameters are set to the human auditory properties; Section 7 contains our conclusions.