CNN-Based Phonetic Segmentation Refinement with a Cross-Speaker Setup

This work proposes a method to improve the performance of automatic phonetic alignment of speech data. The method uses a deep convolutional neural network (CNN) trained on a combination of acoustic features extracted from labeled data to fine tune the position of each boundary within a fixed-size wi...

Full description

Saved in:
Bibliographic Details
Published inComputational Processing of the Portuguese Language Vol. 11122; pp. 448 - 456
Main Authors Cuozzo, Luis Gustavo D., Silva, Diego Augusto, Neto, Mario Uliani, Simões, Flávio Olmos, Nagle, Edson Jose
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2018
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This work proposes a method to improve the performance of automatic phonetic alignment of speech data. The method uses a deep convolutional neural network (CNN) trained on a combination of acoustic features extracted from labeled data to fine tune the position of each boundary within a fixed-size window around the original boundary position. The proposed method is robust to speaker identity, which means that a system trained with enough labeled data can be used to fine tune alignment on any speech file, regardless of speaker identity. With an absolute gain between 20% and 33% in cross speaker scenario, our results demonstrate the applicability of deep learning for this task.
ISBN:9783319997216
3319997211
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-99722-3_45