HA3CI Research Laboratory

Past publications (before October 2021) can be found here.

招待講演・Invited Talks

[Keynote speaker] S. Sakti, "Communicative Intelligent Systems towards Society 5.0," “Sarasehan Nasional Pendidikan Tinggi Informatika dan Pemberian Tribute kepada Penggagas dan Pendidik Senior Teknik Informatika ITB, Feb 2nd, 2023
[Invited speaker] S. Sakti, "Language Technology for All: From the indigenous community perspectives" [A joint work with J. Mariani (LIMSI), K. Choukri (ELRA), C. Soria (CNR-ILC), M. Malero (BSC)], "Data, Technologies and Benchmarks for the Spoken Languages of the World" Meeting, IEEE SLT, Jan 13th, 2023
[Keynote speaker] S. Sakti, "Language Technology for All: From the technology and indigenous community perspectives" [A joint work with M. Heck, A. Tjandra, J. Effendi, S. Novitasari, S. Nakamura (NAIST, Japan), J. Mariani (LIMSI), K. Choukri (ELRA), C. Soria (CNR-ILC), M. Malero (BSC)], the 25th Conference of the Oriental COCOSDA, Nov 25th, 2022
[Invited panelist] O. Scharenborg (TU Delft, Netherland), E. Ahn (U. Washington, USA), G. Anumanchipalli (UC Berkeley, USA), S. Sakti (JAIST, Japan), Moderator: A. Black, "Data Collection, Bias, and Ethical Concerns in Speech Processing," Speech for Social Good - INTERSPEECH Satellite Workshop, September 24th, 2022
[Invited speaker] S. Sakti, "Semi-supervised Learning for Low-resource Multilingual and Multimodal Speech Processing with Machine Speech Chain" [A joint work with A. Tjandra, J. Effendi, S. Novitasari, S. Nakayama, T. Yanagita, S. Nakamura (NAIST/RIKEN AIP, Japan)], HiTZ Language Technology Webinar, May 5th, 2022
[Invited speaker] S. Sakti, "Self-Adaptive Machine Speech Chain in Noisy Environment" [A joint work with A. Tjandra, J. Effendi, S. Novitasari, S. Nakamura (NAIST/RIKEN AIP, Japan)], The AAAI workshop on Self-supervised Learning for Audio and Speech Processing, Feb 28th, 2022
[Invited speaker] S. Sakti, "Machine Speech Chain: A Deep Learning Approach for Modeling Human Speech Perception and Production with Auditory Feedback Mechanism" [A joint work with A. Tjandra, J. Effendi, S. Novitasari, S. Nakamura (NAIST/RIKEN AIP, Japan)], The ITB Seminar, Dec 24th, 2021
[Keynote speaker] S. Sakti, "Machine Speech Chain: A Deep Learning Approach for Training and Inference through Feedback Loop" [A joint work with A. Tjandra, J. Effendi, S. Novitasari, S. Nakamura (NAIST/RIKEN AIP, Japan)], the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, Dec 15th, 2021
[Keynote speaker] S. Sakti, "Listening while Speaking and Visualizing: A Semi-supervised Approach with Multimodal Machine Speech Chain" [A joint work with A. Tjandra, J. Effendi, S. Novitasari, S. Nakamura (NAIST/RIKEN AIP, Japan)], the SoCS International Seminar, Dec 10th, 2021
[Keynote speaker] S. Sakti, "Listening while Speaking and Visualizing: A Semi-supervised Approach with Multimodal Machine Speech Chain" [A joint work with A. Tjandra, J. Effendi, S. Novitasari, S. Nakamura (NAIST/RIKEN AIP, Japan)], the International Conference of Artificial Intelligence and Speech Technology (AIST), Nov 13th, 2021

著書 / 書籍の一章・Scientific Book / Chapter in Scientific Book

S. Asai, K. Yoshino, S. Shinagawa, S. Sakti, S. Nakamura, "Eliciting Cooperative Persuasive Dialogue by Multimodal Emotional Robot," In: Stoyanchev, S., Ultes, S., Li, H. (eds) Conversational AI for Natural Human-Centric Interaction. Lecture Notes in Electrical Engineering, vol 943. Springer, Singapore, Nov 2022 [PDF]
S. Sakti, "Language technology impact on linguistic diversity". In Book: "State of the art of indigenous languages in research: a collection of selected research papers," In the framework of the International Decade of Indigenous Languages (2022-2032), UNESCO Open Access Repository, pp. 341-348, May 2022 [PDF]
LP. Morency, S. Sakti, B.W. Schuller, S. Ultes, "Multimodal Machine Learning for Social Interaction with Ageing Individuals". In Book: J. Miehle, W. Minker, E. André, K. Yoshino (eds), "Multimodal Agents for Ageing and Multicultural Societies," Springer, Singapore, pp. 61–70, Oct. 2021 [PDF]

査読付き学術論文誌・Peer-reviewed Scientific Journals

T. Yanagita, S. Sakti, S. Nakamura, "Japanese Neural Incremental Text-to-Speech Synthesis Framework With an Accent Phrase Input", IEEE Access, Vol. 11, pp. 22355 - 22363, Mar 2023 [PDF]
S. Novitasari, S. Sakti, S. Nakamura, "A Machine Speech Chain Approach for Dynamically Adaptive Lombard TTS in Static and Dynamic Noise Environments", IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30, pp. 2673-2688, Aug 2022 [PDF]
F. Yang, Z. Wang, Y. Wu, S. Sakti, S. Nakamura, "Tackling multiple object tracking with complicated motions — Re-designing the integration of motion and appearance", Image and Vision Computing, Vol. 124, Aug 2022 [PDF] [Based on our winner solutions of the CVPR 2020 WAD MOT Challenge and the CVPR 2020 MOTS Challenge]
柳田智也, サクティサクリアニ, 中村哲, "日本語逐次音声合成における合成単位", 情報処理学会論文誌, Vol. 63, No. 4, pp. 1149-1158, Apr. 2022 [PDF]
B. Wu, S. Sakti, J. Zhang, S. Nakamura, "Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR", IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Vol. 30, pp. 901-916, Feb 2022 [PDF]
S. Novitasari, S. Sakti, S. Nakamura, "Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation", IEICE Transactions on Information and Systems, E104.D (12), pp. 2195-2208, Dec 2021 [PDF]

査読付き国際会議論文・Peer-reviewed International Conferences

S. Sakti, B.A. Titalim, "Leveraging the Multilingual Indonesian Ethnic Languages Dataset in Self-supervised Model for Low-resource ASR Task", ASRU, pp. to appear, Dec 2023
R.F. Widiaputri, A. Purwarianti, D. Lestari, K. Azizah, D. Tanaya, S. Sakti, "Speech Recognition and Meaning Interpretation: Towards Disambiguation of Structurally Ambiguous Spoken Utterances in Indonesian", EMNLP, pp. to appear, Dec 2023
B. Hartanti, D. Tanaya, K. Azizah, D. Lestari, A. Purwarianti, S. Sakti, "Generating Speech with Prosodic Prominence based on SSL-Visually Grounded Models", Oriental COCOSDA, pp. to appear, Dec 2023
H. Xi and S. Sakti, "Exploring Difficulties Encountered by Professional Interpreters in Japanese-to-English and English-to-Japanese Simultaneous Translation", Oriental COCOSDA, pp. to appear, Dec 2023
C. Tran, C.M. Luong, S. Sakti, "STEN-TTS: Improving Zero-shot Cross-Lingual Transfer for Multi-Lingual TTS with Style-Enhanced Normalization Diffusion Framework", INTERSPEECH, pp. 4464-4468, Aug 2023 [PDF]
S. Takahashi, S. Sakti, "Unsupervised Learning of Discrete Latent Representations with Data-Adaptive Dimensionality from Continuous Speech Streams", INTERSPEECH, pp. 416-420, Aug 2023 [PDF]
T.D. Tran, S. Sakti, "Low-Resource Japanese-English Speech-to-Text Translation Leveraging Speech-Text Unified-model Representation Learning", INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL), pp. 78-82, Aug 2023 [PDF]
L.T. Nguyen, S. Sakti, "VGSAlign: Bilingual Speech Alignment of Unpaired and Untranscribed Languages using Self-Supervised Visually Grounded Speech Models", INTERSPEECH Satellite Workshop - the ELRA/ISCA Special Interest Group on Under-resourced Languages (SIGUL), pp. 53-57, Aug 2023 [PDF]
R. Fukuda, Y. Nishikawa, Y. Kano, Y. Ko, T. Yanagita, K. Doi, M. Makinae, S. Sakti, K. Sudoh, S. Nakamura, "NAIST Simultaneous Speech-to-speech Translation System for IWSLT 2023", IWSLT, pp. 330-340, Jul 2023 [PDF]
S. Cahyawijaya, H. Lovenia, A.F. Aji, G.I. Winata, B.Wilie, F. Koto, R. Mahendra, C. Wibisono, A. Romadhony, K. Vincentio, J. Santoso, D. Moeljadi, C. Wirawan, F. Hudi, M.S. Wicaksono, I.H. Parmonangan, I. Alfina, I.F. Putra, S. Rahmadani, Y. Oenang, A.A. Septiandri, J. Jaya, K. Dhole, A.A. Suryani, R.A. Putri, D. Su, K. Stevens, M.N. Nityasya, M.F. Adilazuarda, R. Ignatius, R. Diandaru, V. Ghifari, T. Yu, W. Dai, Y. Xu, D. Damapuspita, H.A. Wibowo, C. Tho, I.M. Karo, T.N. Fatyanosa, Z. Ji, G. Neubig, T. Baldwin, S. Ruder, P. Fung, H. Sujaini, S. Sakti, A. Purwarianti, "NusaCrowd: Open Source Initiative for Indonesian NLP Resources", ACL Findings, pp. 13745-13818, Jul 2023 [PDF]
J. Chen, S. Sakti, "An Isotropy Analysis for Self-supervised Acoustic Unit Embeddings on the Zero Resource Speech Challenge 2021 Framework", IEEE ICASSP, Jun 2023 [PDF]
S. Novitasari, S. Sakti, S. Nakamura, "Self-adaptive Incremental Machine Speech Chain for Lombard TTS with High-granularity ASR Feedback in Dynamic Noise Condition", IEEE ICASSP, Jun 2023 [PDF]
H. Qi, S. Novitasari, A. Tjandra, S. Sakti, S. Nakamura, "SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain," arXiv preprint arXiv:2301.02966, Jan 2023 [PDF]
R. Chevi, R.E. Prasojo, A.F. Aji, A. Tjandra, S. Sakti, "Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation", IEEE SLT, Jan 2023 [PDF]
H. Qi, S. Novitasari, S. Sakti, S. Nakamura, "Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing", INTERSPEECH, pp. 3413-3417, Sep 2022 [PDF]
R. Fukuda, Y. Ko, Y. Kano, K. Doi, H. Tokuyama, S. Sakti, K. Sudoh, S. Nakamura, "NAIST Simultaneous Speech-to-Text Translation System for IWSLT 2022", International Conference on Spoken Language Translation (IWSLT), pp.286-292, May 2022 [PDF]
S. Asai, K. Yoshino, S. Shinagawa, S. Sakti, S. Nakamura, "Eliciting Cooperative Persuasive Dialogue by Multimodal Emotional Robot", International Workshop on Spoken Dialogue Systems Technology (IWSDS), Nov 2021 [PDF]
R. Fukuda, S. Novitasari, Y. Oka, Y. Kano, Y. Yano, Y. Ko, H. Tokuyama, K. Doi, T. Yanagita, S. Sakti, K. Sudoh, S. Nakamura, "Simultaneous Speech-to-speech Translation System with Transformer-based Incremental ASR, MT, and TTS", Oriental COCOSDA, pp. 186-192, Nov 2021 [PDF]
N. Kaiki, S. Sakti, S. Nakamura, "Using Local Phrase Dependency Structure Information in Neural Sequence-to-sequence Speech Synthesis", Oriental COCOSDA, pp. 206-211, Nov 2021 [PDF]
N. Tachimori, S. Sakti, S. Nakamura, "Multi-Encoder Sequential Attention Network for Context-Aware Speech Recognition in Japanese Dialog Conversation", Oriental COCOSDA, pp. 1-6, Nov 2021 [PDF] [Best paper award]

国内会議論文・Domestic Conferences / Articles

J. Effendi, S. Sakti, S. Nakamura, "Cyclic Partially-aligned Transformer for Visually Connected Speech-to-text Mapping", The 2023 Spring meeting of the Acoustical Society of Japan (ASJ), March 2023
多谷邦彦, サクティサクリアニ, 藤原修治, 中村哲, "X-vector を用いた日本語電話音声に対するテキスト独立型話者照合システムの検討", 日本音響学会誌, 79巻1号, pp.18-25, Dec. 2022 [PDF]
S. Novitasari, S. Sakti, S. Nakamura, "Improving Intelligibility of Synthesized Speech in Noisy Condition with Dynamically Adaptive Machine Speech Chain", 情報処理学会音声言語情報処理研究会 SIG-SLP, Dec. 2021 [PDF]

HA3CI Publications

Past publications (before October 2021) can be found here.

招待講演・Invited Talks

著書 / 書籍の一章・Scientific Book / Chapter in Scientific Book

査読付き学術論文誌・Peer-reviewed Scientific Journals

査読付き国際会議論文・Peer-reviewed International Conferences

国内会議論文・Domestic Conferences / Articles