学生のATMAJAさんが23rd Conference of Oriental COCOSDAにおいてBest Student Paper Awardを受賞

 学生のATMAJA, Bagus Trisさん(博士後期課程3年、ヒューマンライフデザイン領域赤木研究室)が23rd Conference of Oriental COCOSDA (International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques 2020) においてBest Student Paper Awardを受賞しました。

 Oriental COCOSDAは、東洋言語の話し言葉コーパスの作成、利用、普及、これらに関するアイデアの交換、情報の共有、地域の問題についての議論、さらには音声認識/合成システムの評価を含む、東洋言語に関する音声研究の促進を目的とし、年1回会議を開催しています。今回、Oriental COCOSDAは、新型コロナウイルス感染症の影響のため、11月5日~7日にかけてミャンマーのUniversity of Computer Studiesがホストとなりオンラインにて開催されました。

*参考:Oriental COCOSDA 2020


Improving Valence Prediction in Dimensional Speech Emotion Recognition Using Linguistic Information

Bagus Tris Atmaja and Masato Akagi

In dimensional emotion recognition, a model called valence, arousal, and dominance is widely used. The current research in dimensional speech emotion recognition has shown a problem that the performance of valence prediction is lower than arousal and dominance. This paper presents an approach to tackle this problem: improving the low score of valence prediction by utilizing linguistic information. Our approach fuses acoustic features with linguistic features, which is a conversion from words to vectors. The results doubled the performance of valence prediction on both single-task learning single-output (predicting valence only) and multitask learning multi-output (predicting valence, arousal, and dominance). Using a proper combination of acoustic and linguistic features improved valence prediction and improved arousal and dominance predictions in multitask learning.

I would like to thank Akagi-sensei for supervising me and MEXT for funding my research. Oriental COCOSDA is a part of the International Committee for Co-ordination and Standardisation of Speech Databases (COCOSDA) for the Asia region. In this conference, I presented my word to tackle the limitation of current dimensional speech emotion recognition (SER); the valence prediction (positive and negative emotions) showed low performance than other dimensions (arousal and dominance). Borrowing technique from sentiment analysis, we added linguistic information in addition to acoustic information for improving valence prediction in dimensional SER. The results showed significant improvement; the performance of valence predictions doubled. We are grateful to the OCOCOSDA committee for granting our paper as the best (student) paper award. This award will motivate me to research harder and better for understanding the science of acoustics and its application for human beings.