Private federated discovery of out-of-vocabulary words for Gboard
The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to th...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
17.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The vocabulary of language models in Gboard, Google's keyboard application,
plays a crucial role for improving user experience. One way to improve the
vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on
user devices. This task requires strong privacy protection due to the sensitive
nature of user input data. In this report, we present a private OOV discovery
algorithm for Gboard, which builds on recent advances in private federated
analytics. The system offers local differential privacy (LDP) guarantees for
user contributed words. With anonymous aggregation, the final released result
would satisfy central differential privacy guarantees with $\varepsilon =
0.315, \delta = 10^{-10}$ for OOV discovery in en-US (English in United
States). |
---|---|
DOI: | 10.48550/arxiv.2404.11607 |