Advancing AI for embodied human communication

Embodied Intelligence and Interaction Laboratory
Associate Professor：STEFANOV, Kalin Momchilov

E-mail： E-mai
［Research areas］
Machine Learning, Social Intelligence
［Keywords］
Embodied Intelligence, Perception-Action Learning, Human-AI Interaction

Skills and background we are looking for in prospective students

We welcome students with strong engineering foundations and an interest in signal processing, machine learning, mathematics, and programming. The lab suits independent thinkers ready for challenges, with strong English communication skills and a drive to grow.

What you can expect to learn in this laboratory

Students will develop and evaluate novel approaches to embodied and multimodal communication modeling, integrating vision, motion, and language, learning to transform abstract principles and ideas into working computational systems. They will implement models, test them empirically, and disseminate results at leading conferences while engaging with international researchers. Through this process, students will acquire deep expertise in embodied AI and multimodal learning, along with transferable skills such as hypothesis formulation, analytical thinking, programming, data analysis, scientific writing, and effective teamwork.

【Job category of graduates】
Academic and industry AI research and development labs

Research outline

The Embodied Intelligence and Interaction Lab advances human-centered AI that empowers people through adaptive and socially grounded interaction. We conduct foundational research at the intersection of artificial intelligence, machine learning, and human communication, investigating how intelligent systems can perceive, model, and produce meaningful behavior expressed through the human body. Our goal is to enable intelligent systems to participate in rich, adaptive, and context-aware interaction.

Our research develops core computational frameworks for embodied intelligence by integrating vision, motion, and language within unified learning architectures. We study multimodal representation learning, generative modeling of coordinated human behavior, and probabilistic approaches to perception-action systems. Rather than treating recognition and generation as isolated tasks, we model human-AI interaction as a continuous loop of perception, prediction, action, and adaptation unfolding over time.

Human communication is temporally extended and socially situated: meaning emerges through the dynamic coordination of multiple modalities within ongoing interaction. Accordingly, we investigate principled architectures for socially intelligent systems capable of operating in dynamic, multiparty environments. This includes multimodal coordination, context-aware adaptation, interactive alignment, and learning from human feedback. A particular emphasis is placed on modeling embodied communication, including nonverbal behavior, as a structured process grounded in 3D human representation.

To advance embodied AI as a rigorous scientific field, we prioritize systematic evaluation methodologies, benchmark development, and reproducible experimental design. By combining theoretical insight with empirical validation, the lab seeks to establish foundational principles for AI systems that understand and produce human-centered, embodied communication in real-world settings.

Key publications

Kundu, K., Barua, H., Robertson-Bell, L., Cai, Z., Stefanov, K. DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2026, pp. 5842-5852.
Adiban, M., Stefanov, K., Siniscalchi, S., Salvi, G. (2025). S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction. IEEE Transactions on Multimedia, vol. 27, pp. 4321-4332.
Cai, Z., Ghosh, S., Stefanov, K., Dhall, A., Cai, J., Rezatofighi, H., Haffari, R., Hayat, M. MARLIN: Masked Autoencoder for Facial Video Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1493-1504.

Equipment

High-Performance Computing

Teaching policy

I emphasize structured mentoring and research-integrated teaching to support both student growth and high-quality scholarship. I align goals early, define clear milestones, and embed core research skills, including problem formulation, experimental design, reproducibility, and scientific communication, within active projects. I foster resilience, ethical data practices, and research integrity, preparing students to contribute responsibly to academia, industry, and society.