[Industry] Federated Learning of Gboard Language Models with Differential Privacy

Zheng Xu; Yanxiang Zhang; Galen Andrew; Christopher Choquette; Peter Kairouz; Brendan Mcmahan; Jesse Rosenstock; Yuanbo Zhang

[Industry] Federated Learning of Gboard Language Models with Differential Privacy

Zheng Xu, Yanxiang Zhang, Galen Andrew, Christopher Choquette, Peter Kairouz, Brendan Mcmahan, Jesse Rosenstock, Yuanbo Zhang

📝 Paper

Anthology

Underline 🪧 Poster 🧑‍🏫 Slides 📺 Watch Video on Underline Add to Favorites

Industry: Industry Industry Paper

Session 5: Industry (Poster)

Conference Room: Frontenac Ballroom and Queen's Quay

Conference Time: July 11, 16:15-17:45 (EDT) (America/Toronto)

Global Time: July 11, Session 5 (20:15-21:45 UTC)

TLDR: We train and deploy language models (LMs) with federated learning (FL) and differential privacy (DP) in Google Keyboard (Gboard). The recent DP-Follow the Regularized Leader (DP-FTRL) algorithm is applied to achieve meaningfully formal DP guarantees without requiring uniform sampling of clients. To...

You can open the #paper-I186 channel in a separate window.

Abstract: We train and deploy language models (LMs) with federated learning (FL) and differential privacy (DP) in Google Keyboard (Gboard). The recent DP-Follow the Regularized Leader (DP-FTRL) algorithm is applied to achieve meaningfully formal DP guarantees without requiring uniform sampling of clients. To provide favorable privacy-utility trade-offs, we introduce a new client participation criterion and discuss the implication of its configuration in large scale systems. We show how quantile-based clip estimation can be combined with DP-FTRL to adaptively choose the clip norm during training or reduce the hyperparameter tuning in preparation of training. With the help of pretraining on public data, we trained and deployed more than fifteen Gboard LMs that achieve high utility and \$\textbackslash{}rho-\$zCDP privacy guarantees with \$\textbackslash{}rho \textbackslash{}in (0.3, 2)\$, with one model additionally trained with secure aggregation. We summarize our experience and provide concrete suggestions on DP training for practitioners.