System and method for distributed information handling system cluster active-active master node

Computing nodes, such as plural information handling systems configured as a High Performance Computing Cluster (HPCC), are managed with plural master nodes configured to have active-active interaction. A resource manager of each of the plural master nodes is operable to simultaneously assign comput...

Full description

Saved in:
Bibliographic Details
Main Authors LIU TONG, CELEBIOGLU ONUR, FANG YUNGIN
Format Patent
LanguageEnglish
Published 07.09.2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Computing nodes, such as plural information handling systems configured as a High Performance Computing Cluster (HPCC), are managed with plural master nodes configured to have active-active interaction. A resource manager of each of the plural master nodes is operable to simultaneously assign computing node resources to job requests. Reservations are made by a job scheduler in a table of a storage common to the active-active master nodes to avoid conflicts between master nodes and then reserved computing resources are assigned for management by the reserving master node resource manager. A failure manager monitors the master nodes to detect a failure, such as by a lack of communication from a master node for a predetermined time, and recovers a failed master node by assigning the jobs associated with the failed master node to an operating master node.
Bibliography:Application Number: US20050069770