WHAM!: Extending Speech Separation to Noisy Environments
Recent progress in separating the speech signals from multiple overlapping speakers using a single audio channel has brought us closer to solving the cocktail party problem. However, most studies in this area use a constrained problem setup, comparing performance when speakers overlap almost complet...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
02.07.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent progress in separating the speech signals from multiple overlapping
speakers using a single audio channel has brought us closer to solving the
cocktail party problem. However, most studies in this area use a constrained
problem setup, comparing performance when speakers overlap almost completely,
at artificially low sampling rates, and with no external background noise. In
this paper, we strive to move the field towards more realistic and challenging
scenarios. To that end, we created the WSJ0 Hipster Ambient Mixtures (WHAM!)
dataset, consisting of two speaker mixtures from the wsj0-2mix dataset combined
with real ambient noise samples. The samples were collected in coffee shops,
restaurants, and bars in the San Francisco Bay Area, and are made publicly
available. We benchmark various speech separation architectures and objective
functions to evaluate their robustness to noise. While separation performance
decreases as a result of noise, we still observe substantial gains relative to
the noisy signals for most approaches. |
---|---|
DOI: | 10.48550/arxiv.1907.01160 |