Generative Portrait Shadow Removal
We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a singl...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
07.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We introduce a high-fidelity portrait shadow removal model that can
effectively enhance the image of a portrait by predicting its appearance under
disturbing shadows and highlights. Portrait shadow removal is a highly
ill-posed problem where multiple plausible solutions can be found based on a
single image. While existing works have solved this problem by predicting the
appearance residuals that can propagate local shadow distribution, such methods
are often incomplete and lead to unnatural predictions, especially for
portraits with hard shadows. We overcome the limitations of existing local
propagation methods by formulating the removal problem as a generation task
where a diffusion model learns to globally rebuild the human appearance from
scratch as a condition of an input portrait image. For robust and natural
shadow removal, we propose to train the diffusion model with a compositional
repurposing framework: a pre-trained text-guided image generation model is
first fine-tuned to harmonize the lighting and color of the foreground with a
background scene by using a background harmonization dataset; and then the
model is further fine-tuned to generate a shadow-free portrait image via a
shadow-paired dataset. To overcome the limitation of losing fine details in the
latent diffusion model, we propose a guided-upsampling network to restore the
original high-frequency details (wrinkles and dots) from the input image. To
enable our compositional training framework, we construct a high-fidelity and
large-scale dataset using a lightstage capturing system and synthetic graphics
simulation. Our generative framework effectively removes shadows caused by both
self and external occlusions while maintaining original lighting distribution
and high-frequency details. Our method also demonstrates robustness to diverse
subjects captured in real environments. |
---|---|
DOI: | 10.48550/arxiv.2410.05525 |