Towards Return Parity in Markov Decision Processes
Algorithmic decisions made by machine learning models in high-stakes domains may have lasting impacts over time. However, naive applications of standard fairness criterion in static settings over temporal domains may lead to delayed and adverse effects. To understand the dynamics of performance disp...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
19.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Algorithmic decisions made by machine learning models in high-stakes domains
may have lasting impacts over time. However, naive applications of standard
fairness criterion in static settings over temporal domains may lead to delayed
and adverse effects. To understand the dynamics of performance disparity, we
study a fairness problem in Markov decision processes (MDPs). Specifically, we
propose return parity, a fairness notion that requires MDPs from different
demographic groups that share the same state and action spaces to achieve
approximately the same expected time-discounted rewards. We first provide a
decomposition theorem for return disparity, which decomposes the return
disparity of any two MDPs sharing the same state and action spaces into the
distance between group-wise reward functions, the discrepancy of group
policies, and the discrepancy between state visitation distributions induced by
the group policies. Motivated by our decomposition theorem, we propose
algorithms to mitigate return disparity via learning a shared group policy with
state visitation distributional alignment using integral probability metrics.
We conduct experiments to corroborate our results, showing that the proposed
algorithm can successfully close the disparity gap while maintaining the
performance of policies on two real-world recommender system benchmark
datasets. |
---|---|
DOI: | 10.48550/arxiv.2111.10476 |