Efficient algorithms for fair clustering with a new notion of fairness
We revisit the problem of fair clustering, first introduced by Chierichetti et al. (Fair clustering through fairlets, 2017), which requires each protected attribute to have approximately equal representation in every cluster, i.e., a Balance property. Existing solutions to fair clustering are either...
Saved in:
Published in | Data mining and knowledge discovery Vol. 37; no. 5; pp. 1959 - 1997 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.09.2023
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We revisit the problem of fair clustering, first introduced by Chierichetti et al. (Fair clustering through fairlets, 2017), which requires each protected attribute to have approximately equal representation in every cluster, i.e., a Balance property. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objectives and fairness. In this paper, we propose a new notion of fairness which we call
τ
-ratio fairness, that strictly generalizes the Balance property and enables a fine-grained efficiency vs. fairness trade-off. Furthermore, we show that a simple greedy round-robin-based algorithm achieves this trade-off efficiently. Under a more general setting of multi-valued protected attributes, we rigorously analyze the theoretical properties of the proposed algorithm, the Fair Round-Robin Algorithm for Clustering Over-End (
FRAC
OE
). We also propose a heuristic algorithm, Fair Round-Robin Algorithm for Clustering (
FRAC
), that applies round-robin allocation at each iteration of a vanilla clustering algorithm. Our experimental results suggest that both
FRAC
and
FRAC
OE
outperform all the state-of-the-art algorithms and work exceptionally well even for a large number of clusters. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1384-5810 1573-756X |
DOI: | 10.1007/s10618-023-00928-6 |