“Anatomy” of AI: Developing Explainable and Logically Thinkable Machines

RebelsNLU: Reading between the Lines for Natural Language Understanding
Associate Professor：INOUE Naoya

E-mail： E-mai
［Research areas］
Natural Language Processing, Language Understanding, Explainability
［Keywords］
Language Model, Deep Learning, AI, Interpretability, Reasoning, Argumentation Analysis, Critical Thinking

Skills and background we are looking for in prospective students

Required: Passion for creating machines that can understand human language. Preferred: (a) Basic knowledge about linear algebra, probability, statistics, and algorithms and (b) experience in programming.

What you can expect to learn in this laboratory

In the lab, you will actively discuss with lab members, come up with innovative ideas, program these ideas as a computational model, and quantitatively evaluate these ideas. From a number of trials, someday you will obtain interesting insights. We then write and submit a paper to academic conferences to polish the idea more, communicating with researchers worldwide. Through these activities, you can acquire so many general-purpose skills, let alone expertise in natural language processing research and its related areas. You will know how to think critically, how to dive into unknown fields, how to plan, how to present your work, how to program, and how to work as a team.

【Job category of graduates】Academia, Information Technology

Research outline

We study methodologies for enabling computers to process the language that people use in everyday life. In particular, we aim to develop reliable natural language processing mechanisms equipped with logical reasoning ability, the capability to explain their own reasoning processes, and critical thinking based on self-reflection. Our primary targets are large language models (LLMs), which have become widely deployed in society and exhibit highly sophisticated intellectual behavior.

Topic 1. Analysis of LLM Internal Mechanisms

The internal workings of LLMs, much like the human brain, consist of a large number of interconnected neural circuits. Although the entire computational process can in principle be traced from beginning to end, the meaning of the computations themselves remains largely mysterious.
We therefore conduct detailed analyses of the internal structures of LLMs to discover the circuits, neurons, distributed representations, and other components that support their intelligent behavior. Our research topics include:

Discovery of a three-step circuit that enables in-context learning in LLMs (Cho et al., ICLR 2025)
Discovery of neurons that function as translation-like units in multilingual LLMs (Tezuka et al., EMNLP 2025)
Forgetting harmful knowledge while preserving language ability (Dang et al., AAAI 2025)
Discovery of additive compositionality in internal representations of vision–language models (Shi et al., EACL 2026)
Analysis of internal representations related to ambiguity in user instructions in LLM agents (Kaide et al., NLP 2026).

Topic 2. Improving the Critical Thinking Ability of LLMs

While LLMs already demonstrate intellectual behavior, they are not necessarily good at evaluating the validity of their own reasoning or correctly understanding the logical structures underlying texts.
We investigate new methods for improving the self-evaluation capability of LLMs by carefully designing instruction strategies. We also study methods for enhancing LLMs’ logical understanding. Our research topics include:

Achieving fine-grained self-assessment of reasoning processes (Ishii et al., ICJNLP-AACL 2025)
Improving logical inference by intervening in the self-attention mechanism (Nguyen et al., EACL 2026)
Proposing tasks for understanding logical fallacies in text (Robbani et al., EMNLP 2024) and for generating questions that promote self-reflection (Pothong et al., ICJNLP-AACL 2025)

In addition to the above topics, we thoroughly and fundamentally study “trustworthy and reliable AI” from various perspectives, especially at the level of internal structures. We warmly welcome students who wish to build this international laboratory together with us, as well as those who aspire to advance to a doctoral program and pursue deeper research.

Key publications

Cho et al. Revisiting In-context Learning Inference Circuit in Large Language Models. ICLR2025.
Tezuka et al. The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Subspace Transitions in Multilingual LLMs. EMNLP2025.
Nguyen et al. Improving Chain-of-Thought for Logical Reasoning via Attention-Aware Intervention. Findings of EACL2026.

Equipment

CPU/GPU Cluster Machines

Teaching policy

I respect you and will do the best I can to bring the best out of you. I will help you to be an independent researcher--to plan a research project truly enjoyable to you and make progress on the project yourself. I also encourage you to present your work at academic conferences and to collaborate with researchers worldwide to be a global researcher. Our lab will have a wide variety of study/reading groups and weekly meetings. We communicate in English for our lab to be a global environment.

［研究室HP］ URL：https://rebelsnlu.super.site/