STREAMING OF NATURAL LANGUAGE (NL) BASED OUTPUT GENERATED USING A LARGE LANGUAGE MODEL (LLM) TO REDUCE LATENCY IN RENDERING THEREOF

Implementations relate to reducing latency in generating and/or rendering natural language (NL) output generated using a large language model (LLM). Processor(s) of a system can: receive NL based input associated with a client device, and generate the NL based output utilizing the LLM. The NL based...

Full description

Saved in:

Bibliographic Details
Main Authors	HUANG, Yanping, JIA, Wenhao, BAILEY, Alexander, TAROPA, Emanuel, CHEN, Zhifeng, ZHENG, Yanyan, AHN, Junwhan, MUDGAL, Sidharth, BAEUML, Martin, LAN, Chang, SCHELIN, Leif, XU, Yuanzhong, STROHMAN, Trevor, BEIRAMI, Ahmad
Format	Patent
Language	English French
Published	19.09.2024
Subjects	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Implementations relate to reducing latency in generating and/or rendering natural language (NL) output generated using a large language model (LLM). Processor(s) of a system can: receive NL based input associated with a client device, and generate the NL based output utilizing the LLM. The NL based output can be a stream of NL based output in that it includes a plurality of segments, and is generated on a segment-by-segment basis. In some implementations, a first segment of the stream of NL based output is selected for inclusion in the stream of NL based output as a second segment (and any subsequent segment) is being generated to reduce latency in evaluating the NL based output as a whole prior to rendering thereof. In some versions of those implementations, the first segment is rendered as the second segment (and any subsequent segment) is being generated to further reduce latency in rendering thereof. Des modes de réalisation concernent la réduction de la latence dans la génération et/ou le rendu d'une sortie en langage naturel (NL) générée à l'aide d'un vaste modèle de langage (LLM). Un ou plusieurs processeurs d'un système peuvent : recevoir une entrée basée le NL associée à un dispositif client, et générer la sortie basée sur le NL à l'aide du LLM. La sortie basée sur le NL peut être un flux de sortie basée sur le NL en ce qu'elle comprend une pluralité de segments, et est générée sur une base segment par segment. Dans certains modes de réalisation, un premier segment du flux de sortie basée sur le NL est sélectionné aux fins d'une inclusion dans le flux de sortie basée sur le NL en tant que second segment (et tout segment suivant) généré pour réduire la latence dans l'évaluation de la sortie basée sur le NL dans son ensemble avant le rendu correspondant. Dans certaines versions de ces modes de réalisation, le premier segment est rendu en tant que second segment (et tout segment suivant) généré pour réduire davantage la latence dans le rendu correspondant.
Bibliography:	Application Number: WO2023US28719