MACHINE LEARNING INFERENCE SERVICE DISAGGREGATION
Aspects of the disclosure are directed to performing disaggregation-aware model graph partitioning, which can include provisioning and load balancing disaggregated resource pools, such as general purpose processors, accelerators, general purpose memory, and high bandwidth memory. Across these disagg...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | English French |
Published |
21.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Aspects of the disclosure are directed to performing disaggregation-aware model graph partitioning, which can include provisioning and load balancing disaggregated resource pools, such as general purpose processors, accelerators, general purpose memory, and high bandwidth memory. Across these disaggregated resource pools, machine learning model operations can be packed and/or batched. The partitioning can further include automatically tuning runtime parameters.
Des aspects de la divulgation concernent la réalisation d'un partitionnement de graphe de modèle sensible à la désagrégation, qui peut comprendre la fourniture et l'équilibrage de charge de groupes de ressources désagrégés, tels que des processeurs à usage général, des accélérateurs, une mémoire à usage général et une mémoire à grande largeur de bande. Parmi ces groupes de ressources désagrégées, des opérations de modèle d'apprentissage automatique peuvent être condensées et/ou mises en lots. Le partitionnement peut en outre consister à accorder automatiquement des paramètres d'exécution. |
---|---|
Bibliography: | Application Number: WO2023US18892 |