Emotional expression in speech plays a vital role in communication. Previous studies using Japanese noise-vocoded speech suggested that temporal amplitude envelopes provide crucial perceptual cues for urgency. However, their effect on Mandarin speech remains unclear.
Research Objective
This study investigates whether temporal amplitude envelope cues in Mandarin speech affect emotion perception, using both original and noise-vocoded samples.
Methodology
Generate noise-vocoded speech from normalized audio samples.
Conduct listening experiments with 5 native Mandarin speakers in a soundproof room.
Participants label 50 samples each from original and vocoded speech.
Figure 5: Schematic diagram of noise-vocoded speech synthesis
Experimental Results
Figure 1: Accuracy of emotion recognition in original speechFigure 2: Accuracy of emotion recognition in noise-vocoded speechFigure 3: Emotion classification tendencies in original speechFigure 4: Emotion classification tendencies in vocoded speech
Conclusion
Temporal amplitude envelope cues affect emotion perception in Mandarin speech.
Classification trends in noise-vocoded speech resemble those in original speech.
Some consistent confusions were observed (e.g., happy → surprise, angry → surprise).
Reference
Unoki, Y., Kawamura, M., Kitani, S., Kobayashi, M., & Akagi, M. (2019). “Examination of amplitude envelope cues contributing to perception of urgency in speech.” Acoustical Society of Japan, Noise & Vibration Committee.