Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around an...
Saved in:
Published in | IEEE transactions on pattern analysis and machine intelligence Vol. 40; no. 5; pp. 1086 - 1099 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.05.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms. |
---|---|
AbstractList | Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms. |
Author | Xiaofei Li Ba, Sileye Horaud, Radu Gebru, Israel D. |
Author_xml | – sequence: 1 givenname: Israel D. surname: Gebru fullname: Gebru, Israel D. email: israel-dejene.gebru@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France – sequence: 2 givenname: Sileye surname: Ba fullname: Ba, Sileye email: sileye.ba@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France – sequence: 3 surname: Xiaofei Li fullname: Xiaofei Li email: xiaofei.li@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France – sequence: 4 givenname: Radu surname: Horaud fullname: Horaud, Radu email: radu.horaud@inria.fr organization: INRIA Grenoble Rhone-Alpes, Montbonnot St. Martin, France |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/28103192$$D View this record in MEDLINE/PubMed https://inria.hal.science/hal-01413403$$DView record in HAL |
BookMark | eNp9kU9v1DAQxS1URLeFLwASisQFDln8385xWyhbaRFILVyt2WQiXLJxaidI5dPjdLc99MBpRqPfm9G8d0KO-tAjIa8ZXTJGq4_X31dfL5ecMrPkWlpTiWdkwZmmZcUrfkQWlGleWsvtMTlJ6YZSJhUVL8gxt4wKVvEFWa-mxofyp08TdMXVgPAbY_HJQ_R_YfShL84gYVPk5mqYByPuhhAzewZ3mDz0xcWUMveSPG-hS_jqUE_Jj4vP1-frcvPty-X5alPWUrGxlLoWIFQFvKHApZKqBtsIBQBG13LbNriVW11DxluwtDUV1ZVRDQeFFJk4JR_2e39B54bodxDvXADv1quNm2f5SSYkFX9m9v2eHWK4nTCNbudTjV0HPYYpOWY1U5YzaTP67gl6E6bY508cZybbVimjM_X2QE3bHTaP9x_8zIDdA3UMKUVsXe3Hex_HCL5zjLo5OncfnZujc4fospQ_kT5s_6_ozV7kEfFRYCw1QkjxDxp2olU |
CODEN | ITPIDJ |
CitedBy_id | crossref_primary_10_3390_bioengineering11121233 crossref_primary_10_1109_TASLP_2023_3346643 crossref_primary_10_1109_TMM_2020_3007350 crossref_primary_10_3233_IDT_211005 crossref_primary_10_1109_ACCESS_2020_3007312 crossref_primary_10_1109_JSTSP_2020_2987728 crossref_primary_10_1109_TPAMI_2019_2953020 crossref_primary_10_1109_TBIOM_2024_3412821 crossref_primary_10_1109_TPAMI_2022_3167045 crossref_primary_10_1007_s10462_022_10224_2 crossref_primary_10_1109_TPAMI_2022_3232854 crossref_primary_10_1121_10_0002924 crossref_primary_10_1007_s11042_024_18457_9 crossref_primary_10_1016_j_inffus_2023_102204 crossref_primary_10_3390_info16030233 crossref_primary_10_1109_TMM_2023_3301221 crossref_primary_10_1109_ACCESS_2021_3074797 crossref_primary_10_1007_s12369_025_01213_w crossref_primary_10_1109_TMM_2019_2937185 crossref_primary_10_1145_3657030 crossref_primary_10_1109_ACCESS_2023_3325474 crossref_primary_10_1109_ACCESS_2024_3426670 crossref_primary_10_1109_TASLP_2020_2980974 crossref_primary_10_1109_TMM_2021_3061800 crossref_primary_10_1007_s11042_018_5944_2 crossref_primary_10_3390_s19235163 crossref_primary_10_3390_sym11091154 crossref_primary_10_1007_s10772_020_09681_3 crossref_primary_10_1109_JSTSP_2019_2903472 crossref_primary_10_1109_TPAMI_2018_2798607 crossref_primary_10_3390_s24134229 crossref_primary_10_1109_TMM_2019_2902489 crossref_primary_10_1016_j_neunet_2020_10_003 crossref_primary_10_1109_OJSP_2024_3363649 crossref_primary_10_3390_s23156969 crossref_primary_10_3390_s20102948 |
Cites_doi | 10.1109/CVPR.2014.159 10.1109/JPROC.2003.817150 10.1109/TPAMI.2016.2522425 10.21437/Interspeech.2010-704 10.1007/s11042-014-2274-x 10.21437/Interspeech.2012-579 10.1109/TSP.2006.888095 10.1109/TASL.2006.872619 10.1007/3-540-45113-7_48 10.1109/ICASSP.2016.7471661 10.1145/1322192.1322254 10.1145/2070481.2070507 10.1109/TASL.2011.2125954 10.1109/EUSIPCO.2015.7362413 10.1016/j.sigpro.2011.09.032 10.1109/TASLP.2015.2405475 10.1162/NECO_a_00074 10.1109/MMSP.2013.6659295 10.1109/53.665 10.1007/978-3-540-68585-2_47 10.1109/TASL.2009.2029711 10.1109/TMM.2009.2037387 10.1109/TPAMI.2011.47 10.1109/ICASSP.2007.366271 10.1007/s11042-012-1080-6 10.1016/S0167-6393(98)00048-X 10.1016/j.specom.2014.05.005 10.1109/TASLP.2015.2444654 10.1145/2733373.2806238 10.1109/TMM.2007.906583 10.1109/ICASSP.2015.7177983 10.1109/TASL.2006.881678 10.1007/978-3-319-22482-4_17 10.1109/ICCVW.2015.96 10.1109/TMM.2015.2463722 10.1109/ICCV.2013.150 10.1109/JSTSP.2010.2057198 10.1109/TMM.2014.2377515 10.1109/ICCV.2009.5459303 10.1109/CVPR.2005.274 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | 97E RIA RIE AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 1XC VOOES |
DOI | 10.1109/TPAMI.2017.2648793 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitle | CrossRef PubMed Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic PubMed Technology Research Database |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 2160-9292 1939-3539 |
EndPage | 1099 |
ExternalDocumentID | oai_HAL_hal_01413403v1 28103192 10_1109_TPAMI_2017_2648793 7807334 |
Genre | orig-research Research Support, Non-U.S. Gov't Journal Article |
GrantInformation_xml | – fundername: European Union FP7 ERC Advanced grantid: VHIA (#340113) – fundername: XEROX University Affairs Committee (UAC) grantid: 2015-2017 |
GroupedDBID | --- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB ~02 AAYXX CITATION RIG 5VS 9M8 AAYOK ABFSI ADRHT AETEA AETIX AGSQL AI. AIBXA ALLEH FA8 H~9 IBMZZ ICLAB IFJZH NPM RNI RZB VH1 XJT 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 1XC VOOES |
ID | FETCH-LOGICAL-c451t-46c3a359a2d0a24545ca8d35aaa76c4bfdeb4b6cac45fa80f7906975d2a5e0e13 |
IEDL.DBID | RIE |
ISSN | 0162-8828 |
IngestDate | Fri May 09 12:14:06 EDT 2025 Fri Jul 11 15:44:20 EDT 2025 Sun Jun 29 16:57:37 EDT 2025 Tue Apr 08 05:57:00 EDT 2025 Thu Apr 24 22:51:11 EDT 2025 Tue Jul 01 03:18:23 EDT 2025 Wed Aug 27 02:47:49 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 5 |
Keywords | speaker diarization audio-visual tracking dynamic Bayesian network sound source localization |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c451t-46c3a359a2d0a24545ca8d35aaa76c4bfdeb4b6cac45fa80f7906975d2a5e0e13 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0001-5232-024X |
OpenAccessLink | https://inria.hal.science/hal-01413403 |
PMID | 28103192 |
PQID | 2174509576 |
PQPubID | 85458 |
PageCount | 14 |
ParticipantIDs | hal_primary_oai_HAL_hal_01413403v1 ieee_primary_7807334 crossref_citationtrail_10_1109_TPAMI_2017_2648793 crossref_primary_10_1109_TPAMI_2017_2648793 proquest_miscellaneous_1861582148 proquest_journals_2174509576 pubmed_primary_28103192 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2018-05-01 |
PublicationDateYYYYMMDD | 2018-05-01 |
PublicationDate_xml | – month: 05 year: 2018 text: 2018-05-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: New York |
PublicationTitle | IEEE transactions on pattern analysis and machine intelligence |
PublicationTitleAbbrev | TPAMI |
PublicationTitleAlternate | IEEE Trans Pattern Anal Mach Intell |
PublicationYear | 2018 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) – name: Institute of Electrical and Electronics Engineers |
References | ref35 ref34 ref37 ref15 ref36 ref14 ref30 ref33 ref11 ref32 ref10 ref1 ref39 ref38 ref16 ref19 ref18 fisher iii (ref13) 2000 lathoud (ref42) 2004 hershey (ref12) 2000 nock (ref22) 2003 ref24 ref45 ref23 ref26 ref25 garau (ref2) 2010 ref20 ref41 ref44 ref21 ref28 gebru (ref31) 2015 ref27 ref29 kapsouras (ref7) 2016 ref9 wooters (ref17) 2008 ref4 ref3 ref6 carletta (ref8) 2005 vijayasenan (ref43) 2012 ref5 ref40 |
References_xml | – ident: ref37 doi: 10.1109/CVPR.2014.159 – ident: ref19 doi: 10.1109/JPROC.2003.817150 – ident: ref34 doi: 10.1109/TPAMI.2016.2522425 – start-page: 2654 year: 2010 ident: ref2 article-title: Audio-visual synchronisation for speaker diarisation publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2010-704 – ident: ref6 doi: 10.1007/s11042-014-2274-x – start-page: 28 year: 2005 ident: ref8 article-title: The ami meeting corpus: A pre-announcement publication-title: Proc Workshop Machine Learning for Multimodal Interaction – start-page: 2170 year: 2012 ident: ref43 article-title: Diartk: An open source toolkit for research in multistream speaker diarization and its application to meetings recordings. publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2012-579 – ident: ref10 doi: 10.1109/TSP.2006.888095 – ident: ref20 doi: 10.1109/TASL.2006.872619 – start-page: 488 year: 2003 ident: ref22 article-title: Speaker localisation using audio-visual synchrony: An empirical study publication-title: Proc Int Conf Image Video Retrieval doi: 10.1007/3-540-45113-7_48 – ident: ref41 doi: 10.1109/ICASSP.2016.7471661 – ident: ref24 doi: 10.1145/1322192.1322254 – ident: ref35 doi: 10.1145/2070481.2070507 – ident: ref1 doi: 10.1109/TASL.2011.2125954 – ident: ref40 doi: 10.1109/EUSIPCO.2015.7362413 – ident: ref26 doi: 10.1016/j.sigpro.2011.09.032 – ident: ref33 doi: 10.1109/TASLP.2015.2405475 – ident: ref29 doi: 10.1162/NECO_a_00074 – ident: ref28 doi: 10.1109/MMSP.2013.6659295 – ident: ref30 doi: 10.1109/53.665 – start-page: 509 year: 2008 ident: ref17 article-title: The ICSI rt07s speaker diarization system publication-title: Multimodal Technologies for Perception of Humans doi: 10.1007/978-3-540-68585-2_47 – ident: ref25 doi: 10.1109/TASL.2009.2029711 – ident: ref21 doi: 10.1109/TMM.2009.2037387 – ident: ref3 doi: 10.1109/TPAMI.2011.47 – ident: ref23 doi: 10.1109/ICASSP.2007.366271 – ident: ref4 doi: 10.1007/s11042-012-1080-6 – start-page: 1 year: 2016 ident: ref7 article-title: Multimodal speaker clustering in full length movies publication-title: Multimedia Tools Appl – ident: ref18 doi: 10.1016/S0167-6393(98)00048-X – ident: ref36 doi: 10.1016/j.specom.2014.05.005 – start-page: 813 year: 2000 ident: ref12 article-title: Audio-vision: Using audio-visual synchrony to locate sounds publication-title: Proc Adv Neural Inform Process Syst – start-page: 182 year: 2004 ident: ref42 article-title: Av16. 3: An audio-visual corpus for speaker localization and tracking publication-title: Machine Learning for Multimodal Interaction – ident: ref27 doi: 10.1109/TASLP.2015.2444654 – ident: ref45 doi: 10.1145/2733373.2806238 – ident: ref11 doi: 10.1109/TMM.2007.906583 – ident: ref39 doi: 10.1109/ICASSP.2015.7177983 – start-page: 772 year: 2000 ident: ref13 article-title: Learning joint statistical models for audio-visual fusion and segregation publication-title: Proc Adv Neural Inf Process Syst – ident: ref14 doi: 10.1109/TASL.2006.881678 – start-page: 143 year: 2015 ident: ref31 article-title: Audio-visual speech-turn detection and tracking publication-title: Proc Int Conf Latent Variable Anal Signal Separat doi: 10.1007/978-3-319-22482-4_17 – ident: ref32 doi: 10.1109/ICCVW.2015.96 – ident: ref5 doi: 10.1109/TMM.2015.2463722 – ident: ref44 doi: 10.1109/ICCV.2013.150 – ident: ref15 doi: 10.1109/JSTSP.2010.2057198 – ident: ref16 doi: 10.1109/TMM.2014.2377515 – ident: ref38 doi: 10.1109/ICCV.2009.5459303 – ident: ref9 doi: 10.1109/CVPR.2005.274 |
SSID | ssj0014503 |
Score | 2.5608726 |
Snippet | Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The... |
SourceID | hal proquest pubmed crossref ieee |
SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 1086 |
SubjectTerms | Audio data Audio equipment audio-visual tracking Bayesian analysis Cameras Clustering Computer Science Computer Vision and Pattern Recognition dynamic Bayesian network Face Feature extraction Mel frequency cepstral coefficient Microphones Optical tracking Signal processing Sound sound source localization Speaker diarization Speech Visual signals Visualization |
Title | Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion |
URI | https://ieeexplore.ieee.org/document/7807334 https://www.ncbi.nlm.nih.gov/pubmed/28103192 https://www.proquest.com/docview/2174509576 https://www.proquest.com/docview/1861582148 https://inria.hal.science/hal-01413403 |
Volume | 40 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6xnNpDefURCihUvbVZnMRJ7OPSslqqblUJqLhZtuOIFWgXdTdI5dcz4zzUVgX1ZjmTxNGM4_nsb2YA3tPWEiukjDRaT8RdnkfaViYyLK2qPGGS-Tzb02_55IJ_ucwu1-BjHwvjnPPkMzekpj_LLxe2pq2yo0JQiUE-gAECtyZWqz8x4JmvgoweDM5whBFdgAyTR-ffR9NTYnEVQ-JzoUVSCmBBBQ5k8sd6NLgiNqQvs_K4x-lXnvEGTLsxN4ST62G9MkN7_1c6x__9qE140bqg4aixmS1Yc_Nt2OjKO4TtbN-G57_lKtyByaguZ4vox2xZ481nt05fo_DnGWLtJpIzPMYFsQyxceZZ2m3Sqxvs_-UoVDMc17Q19xIuxifnnyZRW4YhsjyLVxHPbarTTOqkZDrh6HJZLco001oXueWmKp3hJrcaxSstWFVIlssiKxOdOebi9BWszxdz9wZCXmj8hcSiiiuLnmImSicMIco4YdgyAcSdMpRtc5RTqYwb5bEKk8rrUpEuVavLAD7099w2GTqelH6HOu4FKbn2ZPRVUR9RXlPO0rs4gB3SUy_VqiiAvc4kVDvTl4ogHTpdCNsCOOwv4xylgxc9d4t6qWKBfqNIEHkG8Loxpf7ZnR3u_vudb-EZDl80FMs9WF_9rN0-ukErc-Dt_wHQaP4L |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9QwDLe28QA8MNiAFQYUxBv0lrRJmz4eH6cO7iak3dDeojRNxWnT3bS7IsFfj51-CBAg3qLUaZvabuzkZxvgJW0tsSzPI4PSEwmXppGxdRmVLKnrNGY583m2ZydpcSY-nMvzLXg9xMI45zz4zI2o6c_yq5VtaKvsKFNUYlBsww1c9yVvo7WGMwMhfR1ktGFQx9GR6ENkWH40_zSeHROOKxsRogtlkpIAKypxkMe_rEjbXwgP6Qut_N3m9GvPZBdm_Vu3kJOLUbMpR_b7bwkd_3dad-FOZ4SG41Zq7sGWW-7Bbl_gIez0fQ9u_5StcB-KcVMtVtHnxbrBwadXzlwg8bsFetttLGf4BpfEKsTGqcdpd2mvLrH_m6NgzXDS0ObcfTibvJ-_LaKuEENkheSbSKQ2MYnMTVwxEwv8-NaoKpHGmCy1oqwrV4oytQbJa6NYneUszTNZxUY65njyAHaWq6U7gFBkBn8iXNW8tmgrSlU5VZJPyWOGrTIA3jND2y5LORXLuNTeW2G59rzUxEvd8TKAV8OYqzZHxz-pXyCPB0JKr12Mp5r6CPSaCJZ85QHsE58Gqo5FARz2IqE7XV9rcurQ7ELHLYDnw2XUUjp6MUu3ataaK7QcVYy-ZwAPW1Ea7t3L4aM_P_MZ3Czms6meHp98fAy3cCqqBVwews7munFP0CjalE-9LvwANtQBYw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Audio-Visual+Speaker+Diarization+Based+on+Spatiotemporal+Bayesian+Fusion&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Gebru%2C+Israel+D.&rft.au=Ba%2C+Sileye&rft.au=Li%2C+Xiaofei&rft.au=Horaud%2C+Radu&rft.date=2018-05-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=40&rft.issue=5&rft.spage=1086&rft.epage=1099&rft_id=info:doi/10.1109%2FTPAMI.2017.2648793&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2017_2648793 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon |