MACHINE LEARNING INFERENCE SERVICE DISAGGREGATION

Aspects of the disclosure are directed to performing disaggregation-aware model graph partitioning, which can include provisioning and load balancing disaggregated resource pools, such as general purpose processors, accelerators, general purpose memory, and high bandwidth memory. Across these disagg...

Full description

Saved in:
Bibliographic Details
Main Authors LAN, Chang, KRISHNAMURTHY, Arvind, RADPOUR, Soroush, HAYKAL, Salem
Format Patent
LanguageEnglish
French
Published 21.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Aspects of the disclosure are directed to performing disaggregation-aware model graph partitioning, which can include provisioning and load balancing disaggregated resource pools, such as general purpose processors, accelerators, general purpose memory, and high bandwidth memory. Across these disaggregated resource pools, machine learning model operations can be packed and/or batched. The partitioning can further include automatically tuning runtime parameters. Des aspects de la divulgation concernent la réalisation d'un partitionnement de graphe de modèle sensible à la désagrégation, qui peut comprendre la fourniture et l'équilibrage de charge de groupes de ressources désagrégés, tels que des processeurs à usage général, des accélérateurs, une mémoire à usage général et une mémoire à grande largeur de bande. Parmi ces groupes de ressources désagrégées, des opérations de modèle d'apprentissage automatique peuvent être condensées et/ou mises en lots. Le partitionnement peut en outre consister à accorder automatiquement des paramètres d'exécution.
Bibliography:Application Number: WO2023US18892