Towards Human-AI Collaborative
Intelligence through Speech Communication
Associate Professor：SAKTI Sakriani
Spoken Language Processing, Machine Learning
Multilingual Speech Recognition & Synthesis, Machine Speech Chain, Zero-resourced Speech Technology, Direct Speech-to-Speech Translation, Paralinguistic Modeling, Spoken Dialog System, Deep Leaning, and Knowledge Representation & Modeling
Skills and background we are looking for in prospective students
We welcome students with enthusiasm and passion for human-machine communication and spoken language processing and are willing to devote themselves to research actively.
Those who have experience in computer programming (i.e., python) and machine learning and have a passion for using the English language are highly desirable.
What you can expect to learn in this laboratory
Students will gain knowledge of state-of-the-art of spoken language processing and have experience in doing a research project. In addition, they may develop skills in making observations, system development, problem-solving, reading technical papers, and giving scientific presentations.
【Job category of graduates】 ICT companies, research institutes, academic staffs
We aim to research and develop artificial intelligence technologies to support human-human & human-machine communication and foster successful human-machine learning & cooperation of the better future of collaborative intelligence.
Research areas and topics:
Speech recognition and synthesis
Research and develop technologies that can listen and speak by way of automatic speech recognition (ASR) and text-to-speech synthesis (TTS). Possible research topics include multilingual/code-switching ASR & TTS, incremental ASR & TTS.
Machine speech chain
The research focuses on integrating human speech perception & production behaviors, not only to provide technology that can listen and speak but also listen while speaking. Possible research topics include multilingual/multimodal speech chain, incremental speech chain, speech entrainment.
Zero-resourced Speech Technology
The research focuses on developing technologies that can learn a language like a toddler and gradually construct knowledge. Possible research topics include zero-resource speech processing, unsupervised/semi-supervised deep learning, knowledge representation & modeling.
The research focuses on human-like simultaneous speech interpretation that does not translate into text but directly from speech to speech, covering both linguistic and paralinguistic information. Possible research topics include direct speech translation, paralinguistic representation & translation.
- A. Tjandra, S. Sakti, S. Nakamura, "Machine Speech Chain," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 28, pp. 976-989, 2020
- Q.-T. Do, S. Sakti, and S. Nakamura, “Sequence-to-Sequence Models for Emphasis Speech Translation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 26, No. 10, pp. 1873 - 1883, 2018
- S. Sakti, K. Markov, S. Nakamura, W. Minker, “Incorporating Knowledge Sources into Statistical Speech Recognition,” Springer, Boston (USA), Series: Lecture Notes in Electrical Engineering, Vol. 42, 2009.
Computation infrastructure (CPU, GPU)
Each student performs a research project for MSc/PhD dissertation. The topic is decided based on the student's interest along with the Lab project theme. To enhance students' abilities, we conduct:
1. Intensive discussion on a one-to-one meeting.
2. Research progress and group discussion on lab meeting.
3. Supervision of research experiments.
4. Guidance on improving scientific paper writing and presentation skills.
We encourage active discussions among members. To enhance the discussion between Japanese and international students, all presentation materials used in lab activities will be in English (The presentation can be done in Japanese/English)