Concurrent data processing in a distributed system
Systems, methods, and computer media for scheduling vertices in a distributed data processing network and allocating computing resources on a processing node in a distributed data processing network are provided. Vertices, subparts of a data job including both data and computer code that runs on the...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | English |
Published |
11.09.2012
|
Online Access | Get full text |
Cover
Loading…
Summary: | Systems, methods, and computer media for scheduling vertices in a distributed data processing network and allocating computing resources on a processing node in a distributed data processing network are provided. Vertices, subparts of a data job including both data and computer code that runs on the data, are assigned by a job manager to a distributed cluster of process nodes for processing. The process nodes run the vertices and transmit computing resource usage information, including memory and processing core usage, back to the job manager. The job manager uses this information to estimate computing resource usage information for other vertices in the data job that are either still running or waiting to be run. Using the estimated computing resource usage information, each process node can run multiple vertices concurrently. |
---|