Position-based focal loss for diverse and relevant response generation

Response generation models trained with cross entropy loss suffer from over-general responses due to their preference for high-frequent tokens. Focal loss and anti-focal loss are candidates to solve this problem, but they have their own limitation that they exaggerate only one of relevancy or divers...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 165; p. 112037
Main Authors Kim, So-Eon, Park, Seong-Bae
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.11.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Response generation models trained with cross entropy loss suffer from over-general responses due to their preference for high-frequent tokens. Focal loss and anti-focal loss are candidates to solve this problem, but they have their own limitation that they exaggerate only one of relevancy or diversity of responses. Therefore, this paper proposes two novel losses of positional focal loss and adaptive positional focal loss which emphasize relevancy or diversity flexibly according to the position of a target token. The positional focal loss introduces a position function as a weight to the token position, but it tends to underestimate the relevancy for low confident predictions. To tackle this problem, the adaptive positional focal loss balances relevancy and diversity by limiting the effect of over-confident predictions. •The first attempt to show that a model needs to be learned with a difference in relevancy or diversity for each token for relevant and diverse response generation.•This paper proposes to keep a balance between relevancy and diversity of a response by proposing two novel losses.•This paper proposes various position functions and validate their efficiency through experiments.
ISSN:1568-4946
DOI:10.1016/j.asoc.2024.112037