AuthorsW. L. Guay, B. D. Johnsen, D. Brean, and S. Reinemo
TitleSystem and method for signaling dynamic reconfiguration events in a middleware machine environment
AfilliationCommunication Systems
Project(s)No Simula project
Publication TypePatent
Year of Publication2014
International Patent ClassificationG06F11/1423
International Patent NumberUS20130124910A1
Application Number13/649,689
Date Published11/2014

Rerouting around faulty components and migration of jobs both require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In this patent we describe an implementation of dynamic reconfiguration of such host side data-structures. Our implementation preserves the Queue Pairs, and lets the application run without being interrupted. With this implementation, we demonstrate a complete solution to fault tolerance in an InfiniBand network, where dynamic network reconfiguration to a topology-agnostic routing function is used to avoid malfunctioning components. This solution is in principle able to let applications run uninterruptedly on the cluster, as long as the topology is physically connected. Through measurements on our test-cluster we show that the increased cost of our method in setup latency is negligible, and that there is only a minor reduction in throughput during reconfiguration.


Contact person