Position-based focal loss for diverse and relevant response generation
Response generation models trained with cross entropy loss suffer from over-general responses due to their preference for high-frequent tokens. Focal loss and anti-focal loss are candidates to solve this problem, but they have their own limitation that they exaggerate only one of relevancy or divers...
Saved in:
Published in | Applied soft computing Vol. 165; p. 112037 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Response generation models trained with cross entropy loss suffer from over-general responses due to their preference for high-frequent tokens. Focal loss and anti-focal loss are candidates to solve this problem, but they have their own limitation that they exaggerate only one of relevancy or diversity of responses. Therefore, this paper proposes two novel losses of positional focal loss and adaptive positional focal loss which emphasize relevancy or diversity flexibly according to the position of a target token. The positional focal loss introduces a position function as a weight to the token position, but it tends to underestimate the relevancy for low confident predictions. To tackle this problem, the adaptive positional focal loss balances relevancy and diversity by limiting the effect of over-confident predictions.
•The first attempt to show that a model needs to be learned with a difference in relevancy or diversity for each token for relevant and diverse response generation.•This paper proposes to keep a balance between relevancy and diversity of a response by proposing two novel losses.•This paper proposes various position functions and validate their efficiency through experiments. |
---|---|
ISSN: | 1568-4946 |
DOI: | 10.1016/j.asoc.2024.112037 |