In the near future we will see the development of space mission architectures where multiple spacecraft work cooperatively as a cluster to achieve mission objectives. Fault management (FM) is a critical challenge that must be addressed, especially when multiple spacecraft are working in proximity. Automatic fault management reduces the effort required by the ground crew when faults occur, and it reduces the chance of collision by quickly recovering from faults. We are developing a Flexible Fault Manager for Distributed Systems (FFMDS) for these missions. FFMDS is a FM architecture that will include algorithms to be run on each cluster module for fault detection, isolation, and recovery; software to be used at a ground station to direct recovery actions; and protocols for communication of fault information between cluster modules and between modules and the ground station. The architecture is service-oriented, so that algorithms for fault detection, isolation, and recovery can be added to or subtracted from the system as appropriate.
More »