Speech training data set enhancement method and device, equipment and storage medium

The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel spectrogram corresponding to each piece of speech training data, carrying out pixel point rearrangement processing, and obtaining a temporar...

Full description

Saved in:

Bibliographic Details
Main Authors	WANG JIANZONG, TANG YANXI, QU XIAOYANG
Format	Patent
Language	Chinese English
Published	10.08.2021
Subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

Abstract	The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel spectrogram corresponding to each piece of speech training data, carrying out pixel point rearrangement processing, and obtaining a temporary Mel spectrogram after the pixel point rearrangement processing, and setting an erasure region area, setting a shape parameter, a change parameter or a random erasure coefficient of an erasure region for each temporary Mel frequency spectrum, obtaining a plurality of extended Mel frequency spectrograms, and converting each extended Mel frequency spectrogram into corresponding target voice training data, thereby completing the supplement of the voice training data. The method is advantaged in that a problem that the voice model is prone to overfitting in the training process due to less voice training data is solved, robustness of the voice model is improved, the voice model is prevented from being caught in overfi
AbstractList	The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel spectrogram corresponding to each piece of speech training data, carrying out pixel point rearrangement processing, and obtaining a temporary Mel spectrogram after the pixel point rearrangement processing, and setting an erasure region area, setting a shape parameter, a change parameter or a random erasure coefficient of an erasure region for each temporary Mel frequency spectrum, obtaining a plurality of extended Mel frequency spectrograms, and converting each extended Mel frequency spectrogram into corresponding target voice training data, thereby completing the supplement of the voice training data. The method is advantaged in that a problem that the voice model is prone to overfitting in the training process due to less voice training data is solved, robustness of the voice model is improved, the voice model is prevented from being caught in overfi
Author	TANG YANXI QU XIAOYANG WANG JIANZONG
Author_xml	– fullname: WANG JIANZONG – fullname: TANG YANXI – fullname: QU XIAOYANG
BookMark	eNqNijsOwjAQBV1Awe8OSw8SSRA9ikBUNKSPVvZLYomsTbzh_HzEAahGmpm5mUgQzEx1i4DtSAf24qUlx8qUoATpWCx6iFIP7YIjFkcOT2-xITxGH7_xY5OGgVu8R-fHfmmmDd8TVj8uzPp8qsrLFjHUSJEtBFqX1ywr8n22O-TH4p_nBakGOjI
ContentType	Patent
DBID	EVB
DatabaseName	esp@cenet
DatabaseTitleList
Database_xml	– sequence: 1 dbid: EVB name: esp@cenet url: http://worldwide.espacenet.com/singleLineSearch?locale=en_EP sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine Chemistry Sciences Physics
DocumentTitleAlternate	语音训练数据集的增强方法、装置、设备及存储介质
ExternalDocumentID	CN113241062A
GroupedDBID	EVB
ID	FETCH-epo_espacenet_CN113241062A3
IEDL.DBID	EVB
IngestDate	Fri Jul 19 13:07:10 EDT 2024
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	Chinese English
LinkModel	DirectLink
MergedId	FETCHMERGED-epo_espacenet_CN113241062A3
Notes	Application Number: CN202110610940
OpenAccessLink	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210810&DB=EPODOC&CC=CN&NR=113241062A
ParticipantIDs	epo_espacenet_CN113241062A
PublicationCentury	2000
PublicationDate	20210810
PublicationDateYYYYMMDD	2021-08-10
PublicationDate_xml	– month: 08 year: 2021 text: 20210810 day: 10
PublicationDecade	2020
PublicationYear	2021
RelatedCompanies	PING AN TECHNOLOGY (SHENZHEN) CO., LTD
RelatedCompanies_xml	– name: PING AN TECHNOLOGY (SHENZHEN) CO., LTD
Score	3.4752572
Snippet	The invention provides a speech training data set enhancement method and device, equipment and a storage medium. The method comprises steps of extracting a Mel...
SourceID	epo
SourceType	Open Access Repository
SubjectTerms	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Title	Speech training data set enhancement method and device, equipment and storage medium
URI	https://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210810&DB=EPODOC&locale=&CC=CN&NR=113241062A
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dS8NADA9zfr7pVHR-cIL0yaK9tmt9KOKuG0OwGzplb2O9pmyC3bQtgn-9uXNzvuhrDo67HEkuyS8JwHnKExfJ0Jke96XpoJOafmJzU1qxbdmNaytBDZCNGp0n527gDirwsqiF0X1CP3RzRJIoSfJeaH09WwaxQo2tzC_jCZGmN-1-EBpz75j8F9-6MsJm0Op1w64whAhEZEQPgRqo7pD7w29XYJW-0Z6Cf7Wem6oqZfbbpLS3Ya1Hu2XFDlQ-xzXYFIvJazXYuJ8nvGuwrhGaMifiXArzXeg_zhDlmC3GOzAF82Q5FgyzsXpFFfFj37Oh2ShLWIJKH1wwfCsnGh-kqQoXSdqEqfR6-boHZ-1WX3RMOujwhytDES3vZO9DNZtmeABMknbjquWgHacOchxxjFPPjT3ppH7asA6h_vc-9f8Wj2BLcdjU_WCPoVq8l3hCFrmITzUrvwAfeZAx
link.rule.ids	230,309,783,888,25578,76884
linkProvider	European Patent Office
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3dT8IwEL8gfuCbokbxqyZmTy6ybsB8WIx0LKgwiE7DG2HdLWDiQLfFxL_etoL4oq_XpGmvubte-7vfAZzHNKqhCHR6g9pct9CKdTsyqc6N0DTM-pURoQLI-vX2k3U3qA0K8LKohVE8oR-KHFFYFBf2nil_PVs-YrkKW5lehhMhml57geNq8-xY5C-2UdXcptPq99we0xhzmK_5D45sqG6J9IferMCquGLbkme_9dyUVSmz3yHF24K1vpgtybah8DkuQ4ktOq-VYaM7__Auw7pCaPJUCOdWmO5A8DhD5GOyaO9AJMyTpJgRTMbyFOWLH_nuDU1GSUQilP7gguBbPlH4ICWVuEjhTYj8Xs9fd-HMawWsrYuFDn-0MmT-ck_mHhSTaYL7QLjwblRSDpphbCHFEcUwbtTCBrdiO64bB1D5e57Kf4OnUGoH3c6wc-vfH8Km1LauuGGPoJi953gsonMWnii1fgELXpMh
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Apatent&rft.title=Speech+training+data+set+enhancement+method+and+device%2C+equipment+and+storage+medium&rft.inventor=WANG+JIANZONG&rft.inventor=TANG+YANXI&rft.inventor=QU+XIAOYANG&rft.date=2021-08-10&rft.externalDBID=A&rft.externalDocID=CN113241062A