CNN-Based Phonetic Segmentation Refinement with a Cross-Speaker Setup
This work proposes a method to improve the performance of automatic phonetic alignment of speech data. The method uses a deep convolutional neural network (CNN) trained on a combination of acoustic features extracted from labeled data to fine tune the position of each boundary within a fixed-size wi...
Saved in:
Published in | Computational Processing of the Portuguese Language Vol. 11122; pp. 448 - 456 |
---|---|
Main Authors | , , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2018
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This work proposes a method to improve the performance of automatic phonetic alignment of speech data. The method uses a deep convolutional neural network (CNN) trained on a combination of acoustic features extracted from labeled data to fine tune the position of each boundary within a fixed-size window around the original boundary position. The proposed method is robust to speaker identity, which means that a system trained with enough labeled data can be used to fine tune alignment on any speech file, regardless of speaker identity. With an absolute gain between 20% and 33% in cross speaker scenario, our results demonstrate the applicability of deep learning for this task. |
---|---|
ISBN: | 9783319997216 3319997211 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-319-99722-3_45 |