CONCURRENT DATA PROCESSING IN A DISTRIBUTED SYSTEM

Systems, methods, and computer media for scheduling vertices in a distributed data processing network and allocating computing resources on a processing node in a distributed data processing network are provided. Vertices, subparts of a data job including both data and computer code that runs on the...

Full description

Saved in:
Bibliographic Details
Main Authors CHAIKEN RONNIE, RYSEFF JAMES DAVID, SAHA BIKAS
Format Patent
LanguageEnglish
Published 28.10.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Systems, methods, and computer media for scheduling vertices in a distributed data processing network and allocating computing resources on a processing node in a distributed data processing network are provided. Vertices, subparts of a data job including both data and computer code that runs on the data, are assigned by a job manager to a distributed cluster of process nodes for processing. The process nodes run the vertices and transmit computing resource usage information, including memory and processing core usage, back to the job manager. The job manager uses this information to estimate computing resource usage information for other vertices in the data job that are either still running or waiting to be run. Using the estimated computing resource usage information, each process node can run multiple vertices concurrently.
Bibliography:Application Number: US20090428964