ソフトウエア科学領域【デファゴ研究室】
■ Research Overview
A distributed system consists of a collection of programs (called
processes) spread geographically (running on different processors),
a communication medium, and, most importantly, protocols to ensure
agreement, synchronization and cooperation among the processes.
In particular, the whole system must appear to its users as a
single coherent system rather than as a collection of its parts.
From real-time embedded systems to enterprise information systems,
and from power grid control to the Internet, most systems in our
current society are distributed. For this reason, distributed systems
must be able to survive even after the failure of some of its components.
Our research focuses on two goals: making processes work together,
and helping the system survive even if some processors/programs
fail.
■ Main Research Projects
Distributed agreement & failure detection
We develop algorithms to support fault-tolerant agreement between
processes. We have developed a novel approach to replicate running
programs. We have analyzed, classified, and improved several
fault-tolerant group communication algorithms. For robustness,
we consider that (1) communication delays are unpredictable,
and (2) messages can be lost during transit. This makes agreement
a very challenging problem.

Reaching agreement is highly dependent on the ability of running
processes to detect when other processes have crashed. Especially,
efficient failure detection is essential to ensure acceptable
performance in the event of a failure.
Be it on earth, in space, or on other planets, robots provide
an essential support in adverse and hazardous environments, e.g.,
for rescue after a natural catastrophe (e.g., earthquake, tsunami,
etc). Adequate coordination mechanisms must ensure that a coherent
group behavior of cheap mobile devices in hazardous environments,
for which fault-tolerance is a primary concern.
Communication for mobile systems
Our aim is to develop a middleware platform to program groups
of mobile robots in a fault-tolerant way. We consider robots
that communicate using mobile ad hoc networking, but are unable
to synchronize their actions reliably. We have developed a low-level
communication protocol to guarantee that robots can move without
colliding, in spite of having no guarantee about the timing properties
of communication movement or computing.

Distributed algorithms for mobile robots
We study fundamental problems of agreement in mobile system,
and try to prove the theoretical limits to reaching agreement
in such systems. Using a mathematical representation of a mobile
distributed system, we study problems such as gathering (all
robots must self-organize and join at one location in finite
time), or flocking (the robots must move together while keeping
some pattern), and try to derive minimal assumptions. We focus
on the asynchrony between robots, and the fact that some robots
may possibly fail.
■代表的な著書・論文
- X.Defago, S.Souissi. Non uniform circle formation algorithm
for oblivious mobile robots, Theor. Comput. Sci., to appear
(2008).
- S.Souissi, X.Defago, M.Yamashita. Using eventually consistent
compasses to gather oblivious mobile robots with limited visibility.
Proc. SSS'06, pp.471-487, 2006.
- X.Defago, M.Gradinariu, S.Messika, P.Raipin. Fault-tolerant
and self-stabilizing mobile robots gathering: Feasibility study,
Proc. DISC'06, pp.46-60, 2006.
- X.Defago, P.Urban, N.Hayashibara, T.Katayama. Definition
and specification of accrual failure detectors. In Proc. IEEE/IFIP
DSN'05, pp.206-215, 2005.
- X.Defago, A.Schiper, P.Urban. Total order broadcast and multicast
algorithms. ACM Comput. Surv.,36(4) :372-421, Dec.2004
- X.Defago, A.Schiper. Semi-passive replication and Lazy Consensus.
J. Parallel Distr. Com. 64(12):1380-1398, Dec.2004.